From 79602cd61aa084672794faa8aeed3d4d889c852a Mon Sep 17 00:00:00 2001 From: Kerwin <37063904+zhuangchong@users.noreply.github.com> Date: Tue, 3 Dec 2024 23:20:15 +0800 Subject: [PATCH] [doc] Add data types in concept (#4625) --- docs/content/concepts/data-types.md | 179 ++++++++++++++++++ docs/content/concepts/spec/_index.md | 2 +- .../generated/format_table_configuration.html | 36 ++++ .../ConfigOptionsDocGenerator.java | 1 + 4 files changed, 217 insertions(+), 1 deletion(-) create mode 100644 docs/content/concepts/data-types.md create mode 100644 docs/layouts/shortcodes/generated/format_table_configuration.html diff --git a/docs/content/concepts/data-types.md b/docs/content/concepts/data-types.md new file mode 100644 index 000000000000..b33dcd428399 --- /dev/null +++ b/docs/content/concepts/data-types.md @@ -0,0 +1,179 @@ +--- +title: "Data Types" +weight: 7 +type: docs +aliases: +- /concepts/data-types.html +--- + + +# Data Types + +A data type describes the logical type of a value in the table ecosystem. It can be used to declare input and/or output types of operations. + +All data types supported by Paimon are as follows: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DataTypeDescription
BOOLEANData type of a boolean with a (possibly) three-valued logic of TRUE, FALSE, and UNKNOWN.
CHAR
+ CHAR(n) +
Data type of a fixed-length character string.

+ The type can be declared using CHAR(n) where n is the number of code points. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1. +
VARCHAR
+ VARCHAR(n)

+ STRING +
Data type of a variable-length character string.

+ The type can be declared using VARCHAR(n) where n is the maximum number of code points. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1.

+ STRING is a synonym for VARCHAR(2147483647). +
BINARY
+ BINARY(n)

+
Data type of a fixed-length binary string (=a sequence of bytes).

+ The type can be declared using BINARY(n) where n is the number of bytes. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1. +
VARBINARY
+ VARBINARY(n)

+ BYTES +
Data type of a variable-length binary string (=a sequence of bytes).

+ The type can be declared using VARBINARY(n) where n is the maximum number of bytes. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1.

+ BYTES is a synonym for VARBINARY(2147483647). +
DECIMAL
+ DECIMAL(p)
+ DECIMAL(p, s) +
Data type of a decimal number with fixed precision and scale.

+ The type can be declared using DECIMAL(p, s) where p is the number of digits in a number (precision) and s is the number of digits to the right of the decimal point in a number (scale). p must have a value between 1 and 38 (both inclusive). s must have a value between 0 and p (both inclusive). The default value for p is 10. The default value for s is 0. +
TINYINTData type of a 1-byte signed integer with values from -128 to 127.
SMALLINTData type of a 2-byte signed integer with values from -32,768 to 32,767.
INTData type of a 4-byte signed integer with values from -2,147,483,648 to 2,147,483,647.
BIGINTData type of an 8-byte signed integer with values from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
FLOATData type of a 4-byte single precision floating point number.

+ Compared to the SQL standard, the type does not take parameters. +
DOUBLEData type of an 8-byte double precision floating point number.
DATEData type of a date consisting of year-month-day with values ranging from 0000-01-01 to 9999-12-31.

+ Compared to the SQL standard, the range starts at year 0000. +
TIME
+ TIME(p) +
Data type of a time without time zone consisting of hour:minute:second[.fractional] with up to nanosecond precision and values ranging from 00:00:00.000000000 to 23:59:59.999999999.

+ The type can be declared using TIME(p) where p is the number of digits of fractional seconds (precision). p must have a value between 0 and 9 (both inclusive). If no precision is specified, p is equal to 0. +
TIMESTAMP
+ TIMESTAMP(p) +
Data type of a timestamp without time zone consisting of year-month-day hour:minute:second[.fractional] with up to nanosecond precision and values ranging from 0000-01-01 00:00:00.000000000 to 9999-12-31 23:59:59.999999999.

+ The type can be declared using TIMESTAMP(p) where p is the number of digits of fractional seconds (precision). p must have a value between 0 and 9 (both inclusive). If no precision is specified, p is equal to 6. +
TIMESTAMP WITH TIME ZONE
+ TIMESTAMP(p) WITH TIME ZONE +
Data type of a timestamp with time zone consisting of year-month-day hour:minute:second[.fractional] zone with up to nanosecond precision and values ranging from 0000-01-01 00:00:00.000000000 +14:59 to 9999-12-31 23:59:59.999999999 -14:59.

+ This type fills the gap between time zone free and time zone mandatory timestamp types by allowing the interpretation of UTC timestamps according to the configured session time zone. A conversion from and to int describes the number of seconds since epoch. A conversion from and to long describes the number of milliseconds since epoch. +
ARRAY<t>Data type of an array of elements with same subtype.

+ Compared to the SQL standard, the maximum cardinality of an array cannot be specified but is fixed at 2,147,483,647. Also, any valid type is supported as a subtype.

+ The type can be declared using ARRAY<t> where t is the data type of the contained elements. +
MAP<kt, vt>Data type of an associative array that maps keys (including NULL) to values (including NULL). A map cannot contain duplicate keys; each key can map to at most one value.

+ There is no restriction of element types; it is the responsibility of the user to ensure uniqueness.

+ The type can be declared using MAP<kt, vt> where kt is the data type of the key elements and vt is the data type of the value elements. +
MULTISET<t>Data type of a multiset (=bag). Unlike a set, it allows for multiple instances for each of its elements with a common subtype. Each unique value (including NULL) is mapped to some multiplicity.

+ There is no restriction of element types; it is the responsibility of the user to ensure uniqueness.

+ The type can be declared using MULTISET<t> where t is the data type of the contained elements. +
ROW<n0 t0, n1 t1, ...>
+ ROW<n0 t0 'd0', n1 t1 'd1', ...> +
Data type of a sequence of fields.

+ A field consists of a field name, field type, and an optional description. The most specific type of a row of a table is a row type. In this case, each column of the row corresponds to the field of the row type that has the same ordinal position as the column.

+ Compared to the SQL standard, an optional field description simplifies the handling with complex structures.

+ A row type is similar to the STRUCT type known from other non-standard-compliant frameworks.

+ The type can be declared using ROW<n0 t0 'd0', n1 t1 'd1', ...> where n is the unique name of a field, t is the logical type of a field, d is the description of a field. +
diff --git a/docs/content/concepts/spec/_index.md b/docs/content/concepts/spec/_index.md index ef5f03098e20..cc148d6a8b53 100644 --- a/docs/content/concepts/spec/_index.md +++ b/docs/content/concepts/spec/_index.md @@ -1,7 +1,7 @@ --- title: Specification bookCollapseSection: true -weight: 7 +weight: 8 ---