Skip to content

Commit

Permalink
Export of internal ZetaSQL changes.
Browse files Browse the repository at this point in the history
GitOrigin-RevId: 6c6fab8f816cac8828fc8201478c85043e19c106
Change-Id: I69dc4fd4dc7463425e0a9fec8ed61a1aa9b6c41b
  • Loading branch information
ZetaSQL Team authored and KimiWaRokkuWoKikanai committed Mar 4, 2024
1 parent 589026c commit b7b67f5
Show file tree
Hide file tree
Showing 606 changed files with 100,992 additions and 39,286 deletions.
1,361 changes: 800 additions & 561 deletions docs/aggregate-dp-functions.md

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions docs/aggregate_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -1058,10 +1058,10 @@ window_specification:
Returns the count of `TRUE` values for `expression`. Returns `0` if there are
zero input rows, or if `expression` evaluates to `FALSE` or `NULL` for all rows.

Since `expression` must be a `BOOL`, the form
`COUNTIF(DISTINCT ...)` is generally not useful: there is only one distinct
value of `TRUE`. So `COUNTIF(DISTINCT ...)` will return 1 if `expression`
evaluates to `TRUE` for one or more input rows, or 0 otherwise.
Since `expression` must be a `BOOL`, the form `COUNTIF(DISTINCT ...)` is
generally not useful: there is only one distinct value of `TRUE`. So
`COUNTIF(DISTINCT ...)` will return 1 if `expression` evaluates to `TRUE` for
one or more input rows, or 0 otherwise.
Usually when someone wants to combine `COUNTIF` and `DISTINCT`, they
want to count the number of distinct values of an expression for which a certain
condition is satisfied. One recipe to achieve this is the following:
Expand Down
177 changes: 177 additions & 0 deletions docs/array_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,15 @@ ZetaSQL supports the following array functions.
</td>
</tr>

<tr>
<td><a href="#array_zip"><code>ARRAY_ZIP</code></a>

</td>
<td>
Combines elements from two to four arrays into one array.
</td>
</tr>

<tr>
<td><a href="#flatten"><code>FLATTEN</code></a>

Expand Down Expand Up @@ -1305,6 +1314,174 @@ SELECT

[lambda-definition]: https://github.com/google/zetasql/blob/master/docs/functions-reference.md#lambdas

### `ARRAY_ZIP`

```sql
ARRAY_ZIP(
array_input [ AS alias ],
array_input [ AS alias ][, ... ]
[, transformation => lambda_expression ]
[, mode => { 'STRICT' | 'TRUNCATE' | 'PAD' } ]
)
```

**Description**

Combines the elements from two to four arrays into one array.

**Definitions**

+ `array_input`: An input `ARRAY` value to be zipped with the other array
inputs. `ARRAY_ZIP` supports two to four input arrays.
+ `alias`: An alias optionally supplied for an `array_input`. In the results,
the alias is the name of the associated `STRUCT` field.
+ `transformation`: A optionally-named lambda argument. `lambda_expression`
specifies how elements are combined as they are zipped. This overrides
the default `STRUCT` creation behavior.
+ `mode`: A mandatory-named argument that determines how arrays of differing
lengths are zipped. If this optional argument is not supplied, the function
uses `STRICT` mode by default. This argument can be one of the following
values:

+ `STRICT` (default): If the length of any array is different from the
others, produce an error.

+ `TRUNCATE`: Truncate longer arrays to match the length of the shortest
array.

+ `PAD`: Pad shorter arrays with `NULL` values to match the length of the
longest array.

**Details**

+ If an `array_input` or `mode` is `NULL`, this function returns `NULL`, even when
`mode` is `STRICT`.
+ Argument aliases can't be used with the `transformation` argument.

**Return type**

+ If `transformation` is used and `lambda_expression` returns type `T`, the
return type is `ARRAY<T>`.
+ Otherwise, the return type is `ARRAY<STRUCT>`, with the `STRUCT` having a
number of fields equal to the number of input arrays. Each field's name is
either the user-provided `alias` for the corresponding `array_input`, or a
default alias assigned by the compiler, following the same logic used for
[naming columns in a SELECT list][implicit-aliases].

**Examples**

The following query zips two arrays into one:

```sql
SELECT ARRAY_ZIP([1, 2], ['a', 'b']) AS results

/*----------------------*
| results |
+----------------------+
| [(1, 'a'), (2, 'b')] |
*----------------------*/
```

You can give an array an alias. For example, in the following
query, the returned array is of type `ARRAY<STRUCT<A1, alias_inferred>>`,
where:

+ `A1` is the alias provided for array `[1, 2]`.
+ `alias_inferred` is the inferred alias provided for array `['a', 'b']`.

```sql
WITH T AS (
SELECT ['a', 'b'] AS alias_inferred
)
SELECT ARRAY_ZIP([1, 2] AS A1, alias_inferred) AS results
FROM T

/*----------------------------------------------------------+
| results |
+----------------------------------------------------------+
| [{1 A1, 'a' alias_inferred}, {2 A1, 'b' alias_inferred}] |
+----------------------------------------------------------*/
```

To provide a custom transformation of the input arrays, use the `transformation`
argument:

```sql
SELECT ARRAY_ZIP([1, 2], [3, 4], transformation => (e1, e2) -> (e1 + e2))

/*---------+
| results |
+---------+
| [4, 6] |
+---------*/
```

The argument name `transformation` is not required. For example:

```sql
SELECT ARRAY_ZIP([1, 2], [3, 4], (e1, e2) -> (e1 + e2))

/*---------+
| results |
+---------+
| [4, 6] |
+---------*/
```

When `transformation` is provided, the input arrays are not allowed to have
aliases. For example, the following query is invalid:

```sql {.bad}
-- Error: ARRAY_ZIP function with lambda argument does not allow providing
-- argument aliases
SELECT ARRAY_ZIP([1, 2], [3, 4] AS alias_not_allowed, (e1, e2) -> (e1 + e2))
```

To produce an error when arrays with different lengths are zipped, don't
add `mode`, or if you do, set it as `STRICT`. For example:

```sql {.bad}
-- Error: Unequal array length
SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd']) AS results
```

```sql {.bad}
-- Error: Unequal array length
SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd'], mode => 'STRICT') AS results
```

Use the `PAD` mode to pad missing values with `NULL` when input arrays have
different lengths. For example:

```sql
SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd'], [], mode => 'PAD') AS results

/*------------------------------------------------------------------------+
| results |
+------------------------------------------------------------------------+
| [{1, 'a', NULL}, {2, 'b', NULL}, {NULL, 'c', NULL}, {NULL, 'd', NULL}] |
+------------------------------------------------------------------------*/
```

Use the `TRUNCATE` mode to truncate all arrays that are longer than the shortest
array. For example:

```sql
SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd'], mode => 'TRUNCATE') AS results

/*----------------------*
| results |
+----------------------+
| [(1, 'a'), (2, 'b')] |
*----------------------*/
```

<!-- mdlint off(WHITESPACE_LINE_LENGTH) -->

[implicit-aliases]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#implicit_aliases

<!-- mdlint on -->

### `FLATTEN`

```sql
Expand Down
70 changes: 1 addition & 69 deletions docs/arrays.md
Original file line number Diff line number Diff line change
Expand Up @@ -1233,75 +1233,7 @@ consisting of pairs of elements from input arrays, taken from their
corresponding positions. This operation is sometimes called
[zipping][convolution].

You can zip arrays with `UNNEST` and `WITH OFFSET`. In this example, each
value pair is stored as a `STRUCT` in an array.

```sql
WITH
Combinations AS (
SELECT
['a', 'b'] AS letters,
[1, 2, 3] AS numbers
)
SELECT
ARRAY(
SELECT AS STRUCT
letters[SAFE_OFFSET(index)] AS letter,
numbers[SAFE_OFFSET(index)] AS number
FROM Combinations
CROSS JOIN
UNNEST(
GENERATE_ARRAY(
0,
LEAST(ARRAY_LENGTH(letters), ARRAY_LENGTH(numbers)) - 1)) AS index
ORDER BY index
);

/*------------------------------*
| pairs |
+------------------------------+
| [{ letter: "a", number: 1 }, |
| { letter: "b", number: 2 }] |
*------------------------------*/
```

You can use input arrays of different lengths as long as the first array
is equal to or less than the length of the second array. The zipped array
will be the length of the shortest input array.

To get a zipped array that includes all the elements even when the input arrays
are different lengths, change `LEAST` to `GREATEST`. Elements of either array
that have no associated element in the other array will be paired with `NULL`.

```sql
WITH
Combinations AS (
SELECT
['a', 'b'] AS letters,
[1, 2, 3] AS numbers
)
SELECT
ARRAY(
SELECT AS STRUCT
letters[SAFE_OFFSET(index)] AS letter,
numbers[SAFE_OFFSET(index)] AS number
FROM Combinations
CROSS JOIN
UNNEST(
GENERATE_ARRAY(
0,
GREATEST(ARRAY_LENGTH(letters), ARRAY_LENGTH(numbers)) - 1)) AS index
ORDER BY index
);

/*-------------------------------*
| pairs |
+-------------------------------+
| [{ letter: "a", number: 1 }, |
| { letter: "b", number: 2 }, |
| { letter: null, number: 3 }] |
*-------------------------------*/
```
You can zip arrays with the function [`ARRAY_ZIP`]([array-zip]).

## Building arrays of arrays

Expand Down
2 changes: 1 addition & 1 deletion docs/conditional_expressions.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,7 @@ SELECT NULLIFZERO(0) AS result
### `ZEROIFNULL`

```sql
NULLIFZERO(expr)
ZEROIFNULL(expr)
```

**Description**
Expand Down
16 changes: 15 additions & 1 deletion docs/conversion_rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ literals and parameters can also be coerced. See
<td><code>STRING</code></td>
<td>

<span><code>BOOL</code></span><br /><span><code>INT32</code></span><br /><span><code>INT64</code></span><br /><span><code>UINT32</code></span><br /><span><code>UINT64</code></span><br /><span><code>NUMERIC</code></span><br /><span><code>BIGNUMERIC</code></span><br /><span><code>FLOAT</code></span><br /><span><code>DOUBLE</code></span><br /><span><code>STRING</code></span><br /><span><code>BYTES</code></span><br /><span><code>DATE</code></span><br /><span><code>DATETIME</code></span><br /><span><code>TIME</code></span><br /><span><code>TIMESTAMP</code></span><br /><span><code>ENUM</code></span><br /><span><code>PROTO</code></span><br />
<span><code>BOOL</code></span><br /><span><code>INT32</code></span><br /><span><code>INT64</code></span><br /><span><code>UINT32</code></span><br /><span><code>UINT64</code></span><br /><span><code>NUMERIC</code></span><br /><span><code>BIGNUMERIC</code></span><br /><span><code>FLOAT</code></span><br /><span><code>DOUBLE</code></span><br /><span><code>STRING</code></span><br /><span><code>BYTES</code></span><br /><span><code>DATE</code></span><br /><span><code>DATETIME</code></span><br /><span><code>TIME</code></span><br /><span><code>TIMESTAMP</code></span><br /><span><code>ENUM</code></span><br /><span><code>PROTO</code></span><br /><span><code>RANGE</code></span><br />

</td>
<td>&nbsp;</td>
Expand Down Expand Up @@ -247,6 +247,15 @@ literals and parameters can also be coerced. See
<td><code>PROTO</code> (with the same <code>PROTO</code> name)</td>
</tr>

<tr>
<td><code>RANGE</code></td>
<td>

<span><code>RANGE</code></span><br /><span><code>STRING</code></span><br />
</td>
<td>&nbsp;</td>
</tr>

</tbody>
</table>

Expand Down Expand Up @@ -581,6 +590,11 @@ or more supertypes, including itself, which defines its set of supertypes.
</td>
</tr>

<tr>
<td><code>RANGE</code></td>
<td><code>RANGE</code> with the same subtype.</td>
</tr>

</tbody>
</table>

Expand Down
Loading

0 comments on commit b7b67f5

Please sign in to comment.