From 79602cd61aa084672794faa8aeed3d4d889c852a Mon Sep 17 00:00:00 2001 From: Kerwin <37063904+zhuangchong@users.noreply.github.com> Date: Tue, 3 Dec 2024 23:20:15 +0800 Subject: [PATCH] [doc] Add data types in concept (#4625) --- docs/content/concepts/data-types.md | 179 ++++++++++++++++++ docs/content/concepts/spec/_index.md | 2 +- .../generated/format_table_configuration.html | 36 ++++ .../ConfigOptionsDocGenerator.java | 1 + 4 files changed, 217 insertions(+), 1 deletion(-) create mode 100644 docs/content/concepts/data-types.md create mode 100644 docs/layouts/shortcodes/generated/format_table_configuration.html diff --git a/docs/content/concepts/data-types.md b/docs/content/concepts/data-types.md new file mode 100644 index 000000000000..b33dcd428399 --- /dev/null +++ b/docs/content/concepts/data-types.md @@ -0,0 +1,179 @@ +--- +title: "Data Types" +weight: 7 +type: docs +aliases: +- /concepts/data-types.html +--- + + +# Data Types + +A data type describes the logical type of a value in the table ecosystem. It can be used to declare input and/or output types of operations. + +All data types supported by Paimon are as follows: + +
DataType | +Description | +
---|---|
BOOLEAN |
+ Data type of a boolean with a (possibly) three-valued logic of TRUE, FALSE, and UNKNOWN. |
+
CHAR + CHAR(n)
+ |
+ Data type of a fixed-length character string. + The type can be declared using CHAR(n) where n is the number of code points. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1.
+ |
+
VARCHAR + VARCHAR(n) + STRING
+ |
+ Data type of a variable-length character string. + The type can be declared using VARCHAR(n) where n is the maximum number of code points. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1. + STRING is a synonym for VARCHAR(2147483647).
+ |
+
BINARY + BINARY(n) + |
+ Data type of a fixed-length binary string (=a sequence of bytes). + The type can be declared using BINARY(n) where n is the number of bytes. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1.
+ |
+
VARBINARY + VARBINARY(n) + BYTES
+ |
+ Data type of a variable-length binary string (=a sequence of bytes). + The type can be declared using VARBINARY(n) where n is the maximum number of bytes. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1. + BYTES is a synonym for VARBINARY(2147483647).
+ |
+
DECIMAL + DECIMAL(p) + DECIMAL(p, s)
+ |
+ Data type of a decimal number with fixed precision and scale. + The type can be declared using DECIMAL(p, s) where p is the number of digits in a number (precision) and s is the number of digits to the right of the decimal point in a number (scale). p must have a value between 1 and 38 (both inclusive). s must have a value between 0 and p (both inclusive). The default value for p is 10. The default value for s is 0.
+ |
+
TINYINT |
+ Data type of a 1-byte signed integer with values from -128 to 127. |
+
SMALLINT |
+ Data type of a 2-byte signed integer with values from -32,768 to 32,767. |
+
INT |
+ Data type of a 4-byte signed integer with values from -2,147,483,648 to 2,147,483,647. |
+
BIGINT |
+ Data type of an 8-byte signed integer with values from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. |
+
FLOAT |
+ Data type of a 4-byte single precision floating point number. + Compared to the SQL standard, the type does not take parameters.
+ |
+
DOUBLE |
+ Data type of an 8-byte double precision floating point number. |
+
DATE |
+ Data type of a date consisting of year-month-day with values ranging from 0000-01-01 to 9999-12-31. + Compared to the SQL standard, the range starts at year 0000.
+ |
+
TIME + TIME(p)
+ |
+ Data type of a time without time zone consisting of hour:minute:second[.fractional] with up to nanosecond precision and values ranging from 00:00:00.000000000 to 23:59:59.999999999. + The type can be declared using TIME(p) where p is the number of digits of fractional seconds (precision). p must have a value between 0 and 9 (both inclusive). If no precision is specified, p is equal to 0.
+ |
+
TIMESTAMP + TIMESTAMP(p)
+ |
+ Data type of a timestamp without time zone consisting of year-month-day hour:minute:second[.fractional] with up to nanosecond precision and values ranging from 0000-01-01 00:00:00.000000000 to 9999-12-31 23:59:59.999999999. + The type can be declared using TIMESTAMP(p) where p is the number of digits of fractional seconds (precision). p must have a value between 0 and 9 (both inclusive). If no precision is specified, p is equal to 6.
+ |
+
TIMESTAMP WITH TIME ZONE + TIMESTAMP(p) WITH TIME ZONE
+ |
+ Data type of a timestamp with time zone consisting of year-month-day hour:minute:second[.fractional] zone with up to nanosecond precision and values ranging from 0000-01-01 00:00:00.000000000 +14:59 to 9999-12-31 23:59:59.999999999 -14:59. + This type fills the gap between time zone free and time zone mandatory timestamp types by allowing the interpretation of UTC timestamps according to the configured session time zone. A conversion from and to int describes the number of seconds since epoch. A conversion from and to long describes the number of milliseconds since epoch.
+ |
+
ARRAY<t> |
+ Data type of an array of elements with same subtype. + Compared to the SQL standard, the maximum cardinality of an array cannot be specified but is fixed at 2,147,483,647. Also, any valid type is supported as a subtype. + The type can be declared using ARRAY<t> where t is the data type of the contained elements.
+ |
+
MAP<kt, vt> |
+ Data type of an associative array that maps keys (including NULL) to values (including NULL). A map cannot contain duplicate keys; each key can map to at most one value. + There is no restriction of element types; it is the responsibility of the user to ensure uniqueness. + The type can be declared using MAP<kt, vt> where kt is the data type of the key elements and vt is the data type of the value elements.
+ |
+
MULTISET<t> |
+ Data type of a multiset (=bag). Unlike a set, it allows for multiple instances for each of its elements with a common subtype. Each unique value (including NULL) is mapped to some multiplicity. + There is no restriction of element types; it is the responsibility of the user to ensure uniqueness. + The type can be declared using MULTISET<t> where t is the data type of the contained elements.
+ |
+
ROW<n0 t0, n1 t1, ...> + ROW<n0 t0 'd0', n1 t1 'd1', ...>
+ |
+ Data type of a sequence of fields. + A field consists of a field name, field type, and an optional description. The most specific type of a row of a table is a row type. In this case, each column of the row corresponds to the field of the row type that has the same ordinal position as the column. + Compared to the SQL standard, an optional field description simplifies the handling with complex structures. + A row type is similar to the STRUCT type known from other non-standard-compliant frameworks. + The type can be declared using ROW<n0 t0 'd0', n1 t1 'd1', ...> where n is the unique name of a field, t is the logical type of a field, d is the description of a field.
+ |
+