ZetaSQL supports user-defined functions (UDFs). A UDF enables you to create a function using another SQL expression or another programming language, such as JavaScript or Lua. These functions accept columns of input and perform actions, returning the result of those actions as a value.
User-defined functions and table-valued functions in ZetaSQL use the following general syntax:
An SQL UDF lets you specify a SQL expression for the UDF.
Create SQL UDFs using the following syntax:
CREATE
[ { TEMPORARY | TEMP } ] [ AGGREGATE ] FUNCTION
function_name ( [ function_parameter [ NOT AGGREGATE ] [, ...] ] )
[ RETURNS data_type ]
AS ( sql_expression )
function_parameter:
parameter_name { data_type | ANY TYPE }
This syntax consists of the following components:
CREATE ... FUNCTION
: Creates a new function. A function can contain zero or morefunction_parameter
s.TEMPORARY
orTEMP
: Indicates that the function is temporary; that is it exists for the lifetime of the session.AGGREGATE
: Indicates that this is an aggregate function.function_parameter
: A parameter for the function. A parameter includes a name and a data type. The value ofdata_type
is a ZetaSQL data type or may also beANY TYPE
.RETURNS data_type
: Specifies the data type that the function returns. TheRETURNS
clause is optional and ZetaSQL infers the result type of the function from the SQL function body.AS (sql_expression)
: Specifies the SQL code that defines the function.NOT AGGREGATE
: Specifies that a parameter is not aggregate. Can only appear afterCREATE AGGREGATE FUNCTION
. A non-aggregate parameter can appear anywhere in the function definition.
An aggregate UDF can include aggregate or non-aggregate parameters. Like other
aggregate functions, aggregate UDFs normally aggregate
parameters across all rows in a group. However, you can specify
a parameter as non-aggregate with the NOT AGGREGATE
keyword. A non-aggregate
parameter is a scalar parameter with a constant value for all rows in a group;
for example, literals or grouping expressions are valid non-aggregate
parameters. Inside the UDF definition, aggregate parameters can only appear as
parameters to aggregate function calls. Non-aggregate parameters can appear
anywhere in the UDF definition.
A templated parameter can match more than one argument type at function call time. If a function signature includes a templated parameter, ZetaSQL allows function calls to pass one of several argument types to the function.
SQL user-defined function signatures can contain the following templated parameter value:
ANY TYPE
. The function will accept an argument of any type for this parameter. If more than one parameter includesANY TYPE
, a relationship is not enforced between these parameters when the function is defined. However, if the type of argument passed into the function at call time is incompatible with the function definition, this will result in an error.
The following example shows a UDF that employs a SQL function.
CREATE TEMP FUNCTION AddFourAndDivide(x INT64, y INT64)
RETURNS DOUBLE
AS ((x + 4) / y);
WITH numbers AS
(SELECT 1 as val UNION ALL
SELECT 3 as val UNION ALL
SELECT 4 as val UNION ALL
SELECT 5 as val)
SELECT val, AddFourAndDivide(val, 2) AS result
FROM numbers;
+-----+--------+
| val | result |
+-----+--------+
| 1 | 2.5 |
| 3 | 3.5 |
| 4 | 4 |
| 5 | 4.5 |
+-----+--------+
The following example shows an aggregate UDF that uses a non-aggregate
parameter. Inside the function definition, the aggregate SUM
method takes the
aggregate parameter dividend
, while the non-aggregate division operator
( /
) takes the non-aggregate parameter divisor
.
CREATE TEMP AGGREGATE FUNCTION ScaledSum(dividend DOUBLE, divisor DOUBLE NOT AGGREGATE)
RETURNS DOUBLE
AS (SUM(dividend) / divisor);
SELECT ScaledSum(col1, 2) AS scaled_sum
FROM (SELECT 1 AS col1 UNION ALL
SELECT 3 AS col1 UNION ALL
SELECT 5 AS col1);
+------------+
| scaled_sum |
+------------+
| 4.5 |
+------------+
The following example shows a SQL UDF that uses a templated parameter. The resulting function accepts arguments of various types.
CREATE TEMP FUNCTION AddFourAndDivideAny(x ANY TYPE, y ANY TYPE)
AS (
(x + 4) / y
);
SELECT AddFourAndDivideAny(3, 4) AS integer_output,
AddFourAndDivideAny(1.59, 3.14) AS floating_point_output;
+----------------+-----------------------+
| integer_output | floating_point_output |
+----------------+-----------------------+
| 1.75 | 1.7802547770700636 |
+----------------+-----------------------+
The following example shows a SQL UDF that uses a templated parameter to return the last element of an array of any type.
CREATE TEMP FUNCTION LastArrayElement(arr ANY TYPE)
AS (
arr[ORDINAL(ARRAY_LENGTH(arr))]
);
SELECT
names[OFFSET(0)] AS first_name,
LastArrayElement(names) AS last_name
FROM (
SELECT ['Fred', 'McFeely', 'Rogers'] AS names UNION ALL
SELECT ['Marie', 'Skłodowska', 'Curie']
);
+------------+-----------+
| first_name | last_name |
+------------+-----------+
| Fred | Rogers |
| Marie | Curie |
+------------+-----------+
An external UDF lets you specify external code libraries for the UDF.
Create external UDFs using the following syntax.
CREATE
[ { TEMPORARY | TEMP } ] FUNCTION
function_name ( [ function_parameter [, ...] ] )
RETURNS data_type
[ determinism_specifier ]
LANGUAGE language_name AS string_literal
function_parameter:
parameter_name data_type
determinism_specifier:
{ IMMUTABLE | DETERMINISTIC | NOT DETERMINISTIC | VOLATILE | STABLE }
This syntax consists of the following components:
CREATE ... FUNCTION
: Creates a new function. A function can contain zero or morefunction_parameter
s.TEMPORARY
orTEMP
: Indicates that the function is temporary; that is it exists for the lifetime of the session.function_parameter
: A parameter for the function. A parameter includes a name and a data type. The value ofdata_type
is a ZetaSQL data type.determinism_specifier
: Determines if the results of the function can be cached or not. Your choices are:IMMUTABLE
orDETERMINISTIC
: The function always returns the same result when passed the same arguments. The query result is potentially cacheable. For example, if the functionadd_one(i)
always returnsi + 1
, the function is deterministic.NOT DETERMINISTIC
: The function does not always return the same result when passed the same arguments. TheVOLATILE
andSTABLE
keywords are subcategories ofNOT DETERMINISTIC
.VOLATILE
: The function does not always return the same result when passed the same arguments, even within the same run of a query statement. For example ifadd_random(i)
returnsi + rand()
, the function is volatile, because every call to the function can return a different result.STABLE
: Within one run of a query statement, the function will consistently return the same result for the same argument values. However, the result could change across two runs. For example, a function that returns the current session user is stable, because the value would not change within the same run.
RETURNS data_type
: Specifies the data type that the function returns. See Supported UDF data types for more information about allowed values fordata_type
.LANGUAGE ... AS
: Specify the language and code to use.language_name
represents the name of the language, such asjs
for JavaScript.string_literal
represents the code that defines the function body.
The following example creates a persistent UDF.
CREATE FUNCTION MultiplyInputs(x DOUBLE, y DOUBLE)
RETURNS DOUBLE
LANGUAGE js AS """
return x*y;
""";
WITH numbers AS
(SELECT 1 AS x, 5 as y
UNION ALL
SELECT 2 AS x, 10 as y
UNION ALL
SELECT 3 as x, 15 as y)
SELECT x, y, MultiplyInputs(x, y) as product
FROM numbers;
+-----+-----+--------------+
| x | y | product |
+-----+-----+--------------+
| 1 | 5 | 5 |
| 2 | 10 | 20 |
| 3 | 15 | 45 |
+-----+-----+--------------+
The following example creates a temporary UDF.
CREATE TEMP FUNCTION MultiplyInputs(x DOUBLE, y DOUBLE)
RETURNS DOUBLE
LANGUAGE js AS """
return x*y;
""";
WITH numbers AS
(SELECT 1 AS x, 5 as y
UNION ALL
SELECT 2 AS x, 10 as y
UNION ALL
SELECT 3 as x, 15 as y)
SELECT x, y, MultiplyInputs(x, y) as product
FROM numbers;
+-----+-----+--------------+
| x | y | product |
+-----+-----+--------------+
| 1 | 5 | 5 |
| 2 | 10 | 20 |
| 3 | 15 | 45 |
+-----+-----+--------------+
You can create multiple UDFs before a query. For example:
CREATE TEMP FUNCTION MultiplyInputs(x DOUBLE, y DOUBLE)
RETURNS DOUBLE
LANGUAGE js AS """
return x*y;
""";
CREATE TEMP FUNCTION DivideByTwo(x DOUBLE)
RETURNS DOUBLE
LANGUAGE js AS """
return x / 2;
""";
WITH numbers AS
(SELECT 1 AS x, 5 as y
UNION ALL
SELECT 2 AS x, 10 as y
UNION ALL
SELECT 3 as x, 15 as y)
SELECT x,
y,
MultiplyInputs(x, y) as product,
DivideByTwo(x) as half_x,
DivideByTwo(y) as half_y
FROM numbers;
+-----+-----+--------------+--------+--------+
| x | y | product | half_x | half_y |
+-----+-----+--------------+--------+--------+
| 1 | 5 | 5 | 0.5 | 2.5 |
| 2 | 10 | 20 | 1 | 5 |
| 3 | 15 | 45 | 1.5 | 7.5 |
+-----+-----+--------------+--------+--------+
You can pass the result of a UDF as input to another UDF. For example:
CREATE TEMP FUNCTION MultiplyInputs(x DOUBLE, y DOUBLE)
RETURNS DOUBLE
LANGUAGE js AS """
return x*y;
""";
CREATE TEMP FUNCTION DivideByTwo(x DOUBLE)
RETURNS DOUBLE
LANGUAGE js AS """
return x/2;
""";
WITH numbers AS
(SELECT 1 AS x, 5 as y
UNION ALL
SELECT 2 AS x, 10 as y
UNION ALL
SELECT 3 as x, 15 as y)
SELECT x,
y,
MultiplyInputs(DivideByTwo(x), DivideByTwo(y)) as half_product
FROM numbers;
+-----+-----+--------------+
| x | y | half_product |
+-----+-----+--------------+
| 1 | 5 | 1.25 |
| 2 | 10 | 5 |
| 3 | 15 | 11.25 |
+-----+-----+--------------+
The following example sums the values of all fields named "foo" in the given JSON string.
CREATE TEMP FUNCTION SumFieldsNamedFoo(json_row STRING)
RETURNS FLOAT64
LANGUAGE js AS """
function SumFoo(obj) {
var sum = 0;
for (var field in obj) {
if (obj.hasOwnProperty(field) && obj[field] != null) {
if (typeof obj[field] == "object") {
sum += SumFoo(obj[field]);
} else if (field == "foo") {
sum += obj[field];
}
}
}
return sum;
}
var row = JSON.parse(json_row);
return SumFoo(row);
""";
WITH Input AS (
SELECT STRUCT(1 AS foo, 2 AS bar, STRUCT('foo' AS x, 3.14 AS foo) AS baz) AS s, 10 AS foo UNION ALL
SELECT NULL, 4 AS foo UNION ALL
SELECT STRUCT(NULL, 2 AS bar, STRUCT('fizz' AS x, 1.59 AS foo) AS baz) AS s, NULL AS foo
)
SELECT
TO_JSON_STRING(t) AS json_row,
SumFieldsNamedFoo(TO_JSON_STRING(t)) AS foo_sum
FROM Input AS t;
+---------------------------------------------------------------------+---------+
| json_row | foo_sum |
+---------------------------------------------------------------------+---------+
| {"s":{"foo":1,"bar":2,"baz":{"x":"foo","foo":3.14}},"foo":10} | 14.14 |
| {"s":null,"foo":4} | 4 |
| {"s":{"foo":null,"bar":2,"baz":{"x":"fizz","foo":1.59}},"foo":null} | 1.59 |
+---------------------------------------------------------------------+---------+
You must enclose external code in quotes. There are a few options:
"..."
: For simple, one line code snippets that don't contain quotes or escaping, you can use a standard quoted string."""..."""
: If the snippet contains quotes or multiple lines, use triple-quoted blocks.R"""..."""
: If the snippet contains escaping, prefix a triple-quoted block with anR
to indicate that this is a raw string that should ignore escaping rules. If you are not sure which quoting style to use, this one will provide the most consistent results.
Examples
CREATE TEMP FUNCTION PlusOne(x DOUBLE)
RETURNS DOUBLE
LANGUAGE js
AS "return x+1;";
SELECT val, PlusOne(val) AS result
FROM UNNEST([1, 2, 3]) AS val;
+-----------+-----------+
| val | result |
+-----------+-----------+
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
+-----------+-----------+
CREATE TEMP FUNCTION CustomGreeting(a STRING)
RETURNS STRING
LANGUAGE js AS """
var d = new Date();
if (d.getHours() < 12) {
return 'Good Morning, ' + a + '!';
} else {
return 'Good Evening, ' + a + '!';
}
""";
SELECT CustomGreeting(names) as everyone
FROM UNNEST(["Hannah", "Max", "Jakob"]) AS names;
+-----------------------+
| everyone |
+-----------------------+
| Good Morning, Hannah! |
| Good Morning, Max! |
| Good Morning, Jakob! |
+-----------------------+
CREATE TEMP FUNCTION PlusOne(x STRING)
RETURNS STRING
LANGUAGE js AS R"""
var re = /[a-z]/g;
return x.match(re);
""";
SELECT val, PlusOne(val) AS result
FROM UNNEST(["ab-c", "d_e", "!"]) AS val;
+---------+
| result |
+---------+
| [a,b,c] |
| [d,e] |
| NULL |
+---------+
A table-valued function (TVF) is a function that returns an entire output table instead of a single scalar value, and appears in the FROM clause like a table subquery.
You create a TVF using the following structure.
CREATE
{ TEMPORARY | TEMP } TABLE FUNCTION
function_name ( [ function_parameter [, ...] ] )
[ RETURNS TABLE < column_declaration [, ...] > ]
[ { AS query | LANGUAGE language_name AS string_literal } ]
function_parameter:
parameter_name { data_type | ANY TABLE }
column_declaration:
column_name data_type
CREATE ... TABLE FUNCTION
: Creates a new table-valued function function. A function can contain zero or morefunction_parameter
s.TEMPORARY
orTEMP
: Indicates that the function is temporary; that is it exists for the lifetime of the session.function_parameter
: A parameter for the function. A parameter includes a name and a data type. The value ofdata_type
is a ZetaSQL data type. The value ofdata_type
may also beANY TABLE
.RETURNS TABLE
: Specifies the schema of the table that a table-valued function returns as a comma-separated list ofcolumn_name
andTYPE
pairs. IfRETURNS TABLE
is absent, ZetaSQL infers the output schema from theAS query
statement in the function body.AS query
: If you want to create a SQL TVF, specifies the SQL query to run.LANGUAGE ... AS
: If you want to create an external TVF, specifies the language and code to use.language_name
represents the name of the language, such asjs
for JavaScript.string_literal
represents the code that defines the function body.
When a TVF with parameters is called, arguments must be passed in for all parameters that do not have defaults. An argument can be of any supported ZetaSQL type or table, but must be coercible to the related function parameter's type.
Specify a table argument the same way you specify the fields of a STRUCT.
parameter_name TABLE<column_name data_type [, ...]>
The table argument can specify a value table,
in which each row
is a single column of a specific type. To specify a value table as an argument,
include only the data_type
, leaving out the column_name
:
parameter_name TABLE<data_type>
In many cases, the data_type
of the single column in the value table is a
protocol buffer; for example:
CREATE TEMP TABLE FUNCTION AggregatedMovieLogs(
TicketPurchases TABLE<analysis_conduit.codelab.MovieTicketPurchase>)
The function body can refer directly to fields within the proto.
You have the option to specify the input table using the templated type ANY TABLE
in place of TABLE<column_name data_type [, ...]>
. This option enables
you to create a polymorphic TVF that accepts any table as input.
Example
The following example implements a pair of TVFs that define parameterized views of a range of rows from the Customer table. The first returns all rows for a range of CustomerIds; the second calls the first function and applies an additional filter based on CustomerType.
CREATE TEMP TABLE FUNCTION CustomerRange(MinID INT64, MaxID INT64)
AS
SELECT *
FROM Customer
WHERE CustomerId >= MinId AND CustomerId <= MaxId;
CREATE TEMP TABLE FUNCTION CustomerRangeWithCustomerType(
MinId INT64,
MaxId INT64,
customer_type ads.boulder.schema.CustomerType)
AS
SELECT * FROM CustomerRange(MinId, MaxId)
WHERE type = customer_type;
SQL table-valued function signatures can contain the following templated
parameter values for param_type
:
ANY TYPE
. The function will accept an input of any type for this argument. If more than one parameter has the typeANY TYPE
, ZetaSQL does not enforce any relationship between these arguments at the time of function creation. However, passing the function arguments of types that are incompatible with the function definition will result in an error at call time.ANY TABLE
. The function will accept an argument of any relation type for this argument. However, passing the function arguments of types that are incompatible with the function definition will result in an error at call time.
Examples
The following function returns all rows from the input table if the first argument is greater than the second argument; otherwise it returns no rows.
CREATE TEMP TABLE FUNCTION MyFunction(
first ANY TYPE,
second ANY TYPE,
third ANY TABLE)
AS
SELECT *
FROM third
WHERE first > second;
The following function accepts two integers and a table with any set of columns
and returns rows from the table where the predicate evaluates to true. The input
table selected_customers
must contain a column named creation_time
, and
creation_time
must be a numeric type, or the function will return an error.
CREATE TEMP TABLE FUNCTION CustomerCreationTimeRange(
min_creation_time INT64,
max_creation_time INT64,
selected_customers ANY TABLE)
AS
SELECT *
FROM selected_customers
WHERE creation_time >= min_creation_time
AND creation_time <= max_creation_time;
To call a TVF, use the function call in place of the table name in a FROM
clause.
There are two ways to pass a table as an argument to a TVF. You can use a
subquery for the table argument, or you can use the name of a table, preceded by
the keyword TABLE
.
Examples
The following query calls the CustomerRangeWithCustomerType
function to
return a table with rows for customers with a CustomerId between 100
and 200.
SELECT CustomerId, Info
FROM CustomerRangeWithCustomerType(100, 200, 'CUSTOMER_TYPE_ADVERTISER');
The following query calls the CustomerCreationTimeRange
function defined
previously, passing the result of a subquery as the table argument.
SELECT *
FROM
CustomerCreationTimeRange(
1577836800, -- 2020-01-01 00:00:00 UTC
1609459199, -- 2020-12-31 23:59:59 UTC
(
SELECT customer_id, customer_name, creation_time
FROM MyCustomerTable
WHERE customer_name LIKE '%Hernández'
))
The following query calls CustomerCreationTimeRange
, passing the table
MyCustomerTable
as an argument.
SELECT *
FROM
CustomerCreationTimeRange(
1577836800, -- 2020-01-01 00:00:00 UTC
1609459199, -- 2020-12-31 23:59:59 UTC
TABLE MyCustomerTable)