The following sections provide an overview of the ZetaSQL data model.
ZetaSQL data is stored in tables. Each table consists of an ordered list of columns and a number of rows. Each column has a name used to identify it through SQL statements, and is assigned a specific data type.
For example, the following table, Singers, is a standard SQL table.
Column Name | Data Type | Default Value |
---|---|---|
SingerId | INT64 |
<auto-increment> |
FirstName | STRING |
|
LastName | STRING |
|
BirthDate | DATE |
|
Status | STRING |
"active" |
SingerInfo | PROTO<SingerMetadata> |
|
Albums | PROTO<Album> |
The proto, SingerMetadata
, has the following definition:
message SingerMetadata {
optional string nationality = 1;
repeated Residence residence = 2;
message Residence {
required int64 start_year = 1;
optional int64 end_year = 2;
optional string city = 3;
optional string country = 4 [default = "USA"];
}
}
A SELECT *
statement on this table would return rows similar to the following:
SingerId | FirstName | LastName | BirthDate | Status | SingerInfo |
---|---|---|---|---|---|
1 | Marc | Richards | 1970-09-03 | active | {nationality: "England"} |
2 | Catalina | Smith | 1990-08-17 | inactive | {nationality: "U.S.A."} |
3 | Lea | Martin | 1991-11-09 | active | {nationality: "Australia"} |
4 | Xanathe | Riou | 1995-05-23 | inactive | {nationality: U.S.A."} |
While tables do not have a type, some operations will construct an implicit
STRUCT
type out of a SQL row, using the column names and types for field
definitions.
For more information on the data types ZetaSQL supports, see Data Types.
Constraints require that any writes to one or more columns, such as inserts or updates, conform to certain rules. Data manipulation language (DML) statements enforce constraints. ZetaSQL supports the following constraints:
-
Primary key constraint. A primary key consists of one or more columns, and specifies that the value of each row of these combined columns must be unique within that table. A table can contain at most one primary key constraint.
Some data manipulation language (DML) keywords may require the existence of a primary key. ZetaSQL also implicitly builds an index on the primary key. The default order of this index is ascending. The primary key can contain
NULL
values. -
Unique constraint. Specifies that one or more columns must contain only unique values. Unlike a primary key, more than one unique constraint can exist on a table.
An index allows the database engine to query a column or set of columns more quickly. You can specify that sort order is ascending or descending. A unique or primary key index defines an indexed column that is subject to the uniqueness constraint.
ZetaSQL tables support pseudo-columns. Pseudo-columns contain data elements that you can query like regular columns, but are not considered real columns in the table. Pseudo-column values may not be physically stored with each row, but the query engine will materialize a value for the column using some appropriate mechanism.
For example, an engine might support a pseudo-column called ROWNUM
, which
returns a number indicating the order in which a row was returned. You can then
construct a query like this:
SELECT ROWNUM, SingerId, FirstName, LastName FROM Singers
WHERE Status = "active"
Here's an example of rows returned by this query:
ROWNUM | SingerId | FirstName | LastName |
---|---|---|---|
1 | 1 | Marc | Richards |
2 | 3 | Lea | Martin |
In this case,the schema for the Singers table does not define a column,
ROWNUM
. Instead, the engine materializes the data only when requested.
To return a value of a pseudo-column, you must specify it in your query.
Pseudo-columns do not show up in SELECT *
statements. For example:
SELECT * FROM singers
This query will return all named columns in the table, but won't include
pseudo-columns such as ROWNUM
.
In addition to standard SQL tables, ZetaSQL supports value tables. In a value table, rather than having rows made up of a list of columns, each row is a single value of a specific type. These types of tables are common when working with protocol buffers that may be stored in files instead of in the database.
For example, the following protocol buffer definition, AlbumReview
, contains
data about the reviews for an album.
message AlbumReview {
optional string albumtitle = 1;
optional string reviewer = 2;
optional string review = 3;
}
A list of AlbumReview
protocol buffers is stored in a file, AlbumReviewData
.
{albumtitle: "Songs on a Broken Banjo", reviewer: "Dan Starling", review: "Off key"}
{albumtitle: "Six and Seven", reviewer: "Alice Wayfarer", review: "Hurt my ears!"}
{albumtitle: "Go! Go! Go!", reviewer: "Eustace Millson", review: "My kids loved it!"}
The following query returns a stream of rows, with each row a value of type AlbumReview.
SELECT a FROM AlbumReviewsData a
To get specific data, such as all album titles in
the table, you have two options. You can specify albumtitle
as a protocol
buffer field:
SELECT a.albumtitle FROM AlbumReviewsData a
You can also access the top-level fields inside the value (if there are any) like columns in a regular SQL table:
SELECT albumtitle FROM AlbumReviewsData
Value tables are not limited for use with compound data types. A value table can consist of any supported ZetaSQL data type, although value tables consisting of scalar types occur less frequently than structs or protocol buffers.
You can use ZetaSQL to return query results as a value table. This is useful
when you want to create a compound value, such as a protocol buffer, from a
query result and store it as a table that acts like a value table.
To return a query result as a
value table, use the SELECT AS
statement. See
Query Syntax
for more information and examples.
In some cases you might not want to work with the data within a protocol buffer, but with the protocol buffer itself.
Using SELECT AS VALUE
can help you keep your ZetaSQL statements as simple
as possible. To illustrate this, consider the AlbumReview
example specified earlier. To create a new table from this data, you could
write:
CREATE TABLE Reviews AS
SELECT albumreviews FROM AlbumReviewData albumreviews;
This statement creates a standard SQL table that has a single column,
albumreviews
, which has a protocol buffer value of type
AlbumReviewData
. To retrieve all album titles from this table, you'd need to
write a query similar to:
SELECT r.albumreviews.albumtitle
FROM Reviews r;
Now, consider the same initial CREATE TABLE
statement, this time modified to
use SELECT AS VALUE
:
CREATE TABLE Reviews AS
SELECT AS VALUE albumreviews FROM AlbumReview albumreviews;
This statement creates a value table, instead of a standard SQL table. As a result, you can query any protocol buffer field as if it was a column. Now, if you want to retrieve all album titles from this table, you can write a much simpler query:
SELECT albumtitle
FROM Reviews;
Normally, a SET
operation like UNION ALL
expects all tables to be either
standard SQL tables or value tables. However, ZetaSQL allows you to combine
standard SQL tables with value tables—provided that the standard SQL table
consists of a single column with a type that matches the value table's type. The
result of these operations is always a value table.
For example, consider the following definition for a table, SingersAndAlbums.
Column Name | Data Type |
---|---|
SingerId | INT64 |
AlbumId | INT64 |
AlbumReview | PROTO<AlbumReview> |
Next, we have a file, AlbumReviewData
that contains a list of AlbumReview
protocol buffers.
{albumtitle: "Songs on a Broken Banjo", reviewer: "Dan Starling", review: "Off key"}
{albumtitle: "Six and Seven", reviewer: "Alice Wayfarer", review: "Hurt my ears!"}
{albumtitle: "Go! Go! Go!", reviewer: "Eustace Millson", review: "My kids loved it!"}
The following query combines the AlbumReview
data from the
SingersAndAlbums with the data stored in the AlbumReviewData
file and
stores it in a new value table, AllAlbumReviews.
SELECT AS VALUE sa.AlbumReview FROM SingersAndAlbums sa
UNION ALL
SELECT a FROM AlbumReviewData a
The Pseudo-columns section describes how pseudo-columns work with standard SQL tables. In most cases, pseudo-columns work the same with value tables. For example, consider this query:
SELECT a.ROWNUM, a.albumtitle AS title FROM AlbumReviewData a
The following table demonstrates the result of this query:
ROWNUM | title |
---|---|
1 | "Songs on a Broken Banjo" |
2 | "Six and Seven" |
3 | "Go! Go! Go!" |
This example works because a
is an alias of the table AlbumReviewData
, and
this table has a ROWNUM
pseudo-column. As a result, AlbumReviewData a
represents the scanned rows,
not the value.
However, if you tried to construct the query like this:
SELECT a.ROWNUM, a.albumtitle AS title FROM (SELECT a FROM AlbumReviewData a)
This query does not work. The reason it fails is because the subquery, SELECT a FROM AlbumReviewData a
, returns an AlbumReviewData
value only, and this
value does not have a field called ROWNUM
.