From c30299cbf7af24976ffd32d5dcce8b484c845ab4 Mon Sep 17 00:00:00 2001 From: zclllyybb Date: Wed, 29 Nov 2023 15:08:52 +0800 Subject: [PATCH] [docs](partition) Improve auto partitions document (#27662) --- docs/en/docs/admin-manual/config/be-config.md | 22 ++-- .../docs/advanced/partition/auto-partition.md | 106 ++++++++++-------- docs/en/docs/faq/data-faq.md | 5 + .../docs/admin-manual/config/be-config.md | 20 ++-- .../docs/advanced/partition/auto-partition.md | 106 ++++++++++-------- docs/zh-CN/docs/faq/data-faq.md | 8 +- .../doris/service/FrontendServiceImpl.java | 3 +- 7 files changed, 149 insertions(+), 121 deletions(-) diff --git a/docs/en/docs/admin-manual/config/be-config.md b/docs/en/docs/admin-manual/config/be-config.md index cccd7b9ff16b7b..a1ba0674f09a4e 100644 --- a/docs/en/docs/admin-manual/config/be-config.md +++ b/docs/en/docs/admin-manual/config/be-config.md @@ -824,6 +824,16 @@ BaseCompaction:546859: * Default value: 100 * Dynamically modifiable: Yes +#### `olap_table_sink_send_interval_microseconds` + +* Description: While loading data, there's a polling thread keep sending data to corresponding BE from Coordinator's sink node. This thread will check whether there's data to send every `olap_table_sink_send_interval_microseconds` microseconds. +* Default value: 1000 + +#### `olap_table_sink_send_interval_auto_partition_factor` + +* Description: If we load data to a table which enabled auto partition. the interval of `olap_table_sink_send_interval_microseconds` is too slow. In that case the real interval will multiply this factor. +* Default value: 0.001 + ### Thread #### `delete_worker_count` @@ -1290,7 +1300,7 @@ BaseCompaction:546859: * Description: The number of threads making schema changes * Default value: 3 -### `alter_index_worker_count` +#### `alter_index_worker_count` * Description: The number of threads making index change * Default value: 3 @@ -1495,13 +1505,3 @@ Indicates how many tablets failed to load in the data directory. At the same tim * Description: BE Whether to enable the use of java-jni. When enabled, mutual calls between c++ and java are allowed. Currently supports hudi, java-udf, jdbc, max-compute, paimon, preload, avro * Default value: true - -#### `olap_table_sink_send_interval_microseconds` - -* Description: While loading data, there's a polling thread keep sending data to corresponding BE from Coordinator's sink node. This thread will check whether there's data to send every `olap_table_sink_send_interval_microseconds` microseconds. -* Default value: 1000 - -#### `olap_table_sink_send_interval_auto_partition_factor` - -* Description: If we load data to a table which enabled auto partition. the interval of `olap_table_sink_send_interval_microseconds` is too slow. In that case the real interval will multiply this factor. -* Default value: 0.001 diff --git a/docs/en/docs/advanced/partition/auto-partition.md b/docs/en/docs/advanced/partition/auto-partition.md index f601445f874b83..0d84f8363088cc 100644 --- a/docs/en/docs/advanced/partition/auto-partition.md +++ b/docs/en/docs/advanced/partition/auto-partition.md @@ -32,9 +32,58 @@ under the License. The Auto Partitioning feature supports automatic detection of whether the corresponding partition exists during the data import process. If it does not exist, the partition will be created automatically and imported normally. +## Usage Scenarios + +The auto partition function mainly solves the problem that the user expects to partition the table based on a certain column, but the data distribution of the column is scattered or unpredictable, so it is difficult to accurately create the required partitions when building or adjusting the structure of the table, or the number of partitions is so large that it is too cumbersome to create them manually. + +Take the time type partition column as an example, in the [Dynamic Partition](./dynamic-partition) function, we support the automatic creation of new partitions to accommodate real-time data at specific time periods. For real-time user behaviour logs and other scenarios, this feature basically meets the requirements. However, in more complex scenarios, such as dealing with non-real-time data, the partition column is independent of the current system time and contains a large number of discrete values. At this time to improve efficiency we want to partition the data based on this column, but the data may actually involve the partition can not be grasped in advance, or the expected number of required partitions is too large. In this case, dynamic partitioning or manually created partitions can not meet our needs, automatic partitioning function is very good to cover such needs. + +Suppose our table DDL is as follows: + +```sql +CREATE TABLE `DAILY_TRADE_VALUE` +( + `TRADE_DATE` datev2 NULL COMMENT '交易日期', + `TRADE_ID` varchar(40) NULL COMMENT '交易编号', + ...... +) +UNIQUE KEY(`TRADE_DATE`, `TRADE_ID`) +PARTITION BY RANGE(`TRADE_DATE`) +( + PARTITION p_2000 VALUES [('2000-01-01'), ('2001-01-01')), + PARTITION p_2001 VALUES [('2001-01-01'), ('2002-01-01')), + PARTITION p_2002 VALUES [('2002-01-01'), ('2003-01-01')), + PARTITION p_2003 VALUES [('2003-01-01'), ('2004-01-01')), + PARTITION p_2004 VALUES [('2004-01-01'), ('2005-01-01')), + PARTITION p_2005 VALUES [('2005-01-01'), ('2006-01-01')), + PARTITION p_2006 VALUES [('2006-01-01'), ('2007-01-01')), + PARTITION p_2007 VALUES [('2007-01-01'), ('2008-01-01')), + PARTITION p_2008 VALUES [('2008-01-01'), ('2009-01-01')), + PARTITION p_2009 VALUES [('2009-01-01'), ('2010-01-01')), + PARTITION p_2010 VALUES [('2010-01-01'), ('2011-01-01')), + PARTITION p_2011 VALUES [('2011-01-01'), ('2012-01-01')), + PARTITION p_2012 VALUES [('2012-01-01'), ('2013-01-01')), + PARTITION p_2013 VALUES [('2013-01-01'), ('2014-01-01')), + PARTITION p_2014 VALUES [('2014-01-01'), ('2015-01-01')), + PARTITION p_2015 VALUES [('2015-01-01'), ('2016-01-01')), + PARTITION p_2016 VALUES [('2016-01-01'), ('2017-01-01')), + PARTITION p_2017 VALUES [('2017-01-01'), ('2018-01-01')), + PARTITION p_2018 VALUES [('2018-01-01'), ('2019-01-01')), + PARTITION p_2019 VALUES [('2019-01-01'), ('2020-01-01')), + PARTITION p_2020 VALUES [('2020-01-01'), ('2021-01-01')), + PARTITION p_2021 VALUES [('2021-01-01'), ('2022-01-01')) +) +DISTRIBUTED BY HASH(`TRADE_DATE`) BUCKETS 10 +PROPERTIES ( + "replication_num" = "1" +); +``` + +The table stores a large amount of business history data, partitioned based on the date the transaction occurred. As you can see when building the table, we need to manually create the partitions in advance. If the data range of the partitioned columns changes, for example, 2022 is added to the above table, we need to create a partition by [ALTER-TABLE-PARTITION](../../sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION) to make changes to the table partition. If such partitions need to be changed, or subdivided at a finer level of granularity, it is very tedious to modify them. At this point we can rewrite the table DDL using AUTO PARTITION. + ## Grammer -When building a table, use the following syntax to populate [CREATE-TABLE](../../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md) with the `partition_info` section: +When building a table, use the following syntax to populate [CREATE-TABLE](../../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE) with the `partition_info` section: 1. AUTO RANGE PARTITION: @@ -94,55 +143,12 @@ When building a table, use the following syntax to populate [CREATE-TABLE](../.. 1. Currently the AUTO RANGE PARTITION function supports only one partition column; 2. In AUTO RANGE PARTITION, the partition function supports only `date_trunc` and the partition column supports only `DATEV2` or `DATETIMEV2` format; -3. In AUTO LIST PARTITION, function calls are not supported. Partitioned columns support BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DATE, DATETIME, CHAR, VARCHAR datatypes, and partitioned values are enum values. +3. In AUTO LIST PARTITION, function calls are not supported. Partitioned columns support `BOOLEAN`, `TINYINT`, `SMALLINT`, `INT`, `BIGINT`, `LARGEINT`, `DATE`, `DATETIME`, `CHAR`, `VARCHAR` datatypes, and partitioned values are enum values. 4. In AUTO LIST PARTITION, a separate new PARTITION is created for each fetch of a partition column for which the corresponding partition does not currently exist. ## Sample Scenarios -In the [Dynamic Partitioning](./dynamic-partition.md) feature, we support the automatic creation of new partitions to accommodate real-time data at specific time periods. However, in more complex scenarios, such as processing non-real-time data, the partition columns are independent of the current system time. In this case, if you need to partition the data, you need to manually organise the partitions you belong to and create them before importing the data. This is cumbersome when the number of partition columns is large. The automatic partition function solves this problem. - -For example, we have a table as follows: - -```sql -CREATE TABLE `DAILY_TRADE_VALUE` -( - `TRADE_DATE` datev2 NULL - `TRADE_ID` varchar(40) NULL, - ...... -) -UNIQUE KEY(`TRADE_DATE`, `TRADE_ID`) -PARTITION BY RANGE(`TRADE_DATE`) -( - PARTITION p_2000 VALUES [('2000-01-01'), ('2001-01-01')), - PARTITION p_2001 VALUES [('2001-01-01'), ('2002-01-01')), - PARTITION p_2002 VALUES [('2002-01-01'), ('2003-01-01')), - PARTITION p_2003 VALUES [('2003-01-01'), ('2004-01-01')), - PARTITION p_2004 VALUES [('2004-01-01'), ('2005-01-01')), - PARTITION p_2005 VALUES [('2005-01-01'), ('2006-01-01')), - PARTITION p_2006 VALUES [('2006-01-01'), ('2007-01-01')), - PARTITION p_2007 VALUES [('2007-01-01'), ('2008-01-01')), - PARTITION p_2008 VALUES [('2008-01-01'), ('2009-01-01')), - PARTITION p_2009 VALUES [('2009-01-01'), ('2010-01-01')), - PARTITION p_2010 VALUES [('2010-01-01'), ('2011-01-01')), - PARTITION p_2011 VALUES [('2011-01-01'), ('2012-01-01')), - PARTITION p_2012 VALUES [('2012-01-01'), ('2013-01-01')), - PARTITION p_2013 VALUES [('2013-01-01'), ('2014-01-01')), - PARTITION p_2014 VALUES [('2014-01-01'), ('2015-01-01')), - PARTITION p_2015 VALUES [('2015-01-01'), ('2016-01-01')), - PARTITION p_2016 VALUES [('2016-01-01'), ('2017-01-01')), - PARTITION p_2017 VALUES [('2017-01-01'), ('2018-01-01')), - PARTITION p_2018 VALUES [('2018-01-01'), ('2019-01-01')), - PARTITION p_2019 VALUES [('2019-01-01'), ('2020-01-01')), - PARTITION p_2020 VALUES [('2020-01-01'), ('2021-01-01')), - PARTITION p_2021 VALUES [('2021-01-01'), ('2022-01-01')) -) -DISTRIBUTED BY HASH(`TRADE_DATE`) BUCKETS 10 -PROPERTIES ( - "replication_num" = "1" -); -``` - -The table stores a large amount of business history data, partitioned based on the date the transaction occurred. As you can see when building the table, we need to manually create the partitions in advance. If the data range of the partitioned columns changes, for example, 2022 is added to the above table, we need to create a partition by [ALTER-TABLE-PARTITION](../../sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION.md) to make changes to the table partition. After using AUTO PARTITION, the table DDL can be changed to: +In the example in the Usage Scenarios section, the table DDL can be rewritten after using AUTO PARTITION: ```sql CREATE TABLE `DAILY_TRADE_VALUE` @@ -171,7 +177,6 @@ After inserting the data and then viewing it again, we could found that the tabl ```sql mysql> insert into `DAILY_TRADE_VALUE` values ('2012-12-13', 1), ('2008-02-03', 2), ('2014-11-11', 3); Query OK, 3 rows affected (0.88 sec) -{'label':'insert_754e2a3926a345ea_854793fb2638f0ec', 'status':'VISIBLE', 'txnId':'20014'} mysql> show partitions from `DAILY_TRADE_VALUE`; +-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+ @@ -184,8 +189,11 @@ mysql> show partitions from `DAILY_TRADE_VALUE`; 3 rows in set (0.12 sec) ``` +A partition created by the AUTO PARTITION function has the exact same functional properties as a manually created partition. + ## caveat -- If a partition is created during the insertion or importation of data and the process eventually fails, the created partition is not automatically deleted. +- If a partition is created during the insertion or import of data and the entire import process does not complete (fails or is cancelled), the created partition is not automatically deleted. - Tables that use AUTO PARTITION only have their partitions created automatically instead of manually. The original use of the table and the partitions it creates is the same as for non-AUTO PARTITION tables or partitions. -- When importing data to a table with AUTO PARTITION enabled, the polling interval for data sent by the Coordinator is different from that of a normal table. For details, see `olap_table_sink_send_interval_auto_partition_factor` in [BE Configuration](../../admin-manual/config/be-config.md). +- To prevent accidental creation of too many partitions, we use the [FE Configuration](../../admin-manual/config/fe-config) `max_auto_partition_num` controls the maximum number of partitions an AUTO PARTITION table can hold. This value can be adjusted if necessary +- When importing data to a table with AUTO PARTITION enabled, the polling interval for data sent by the Coordinator is different from that of a normal table. For details, see `olap_table_sink_send_interval_auto_partition_factor` in [BE Configuration](../../admin-manual/config/be-config). diff --git a/docs/en/docs/faq/data-faq.md b/docs/en/docs/faq/data-faq.md index ca96200d12e1c2..2c81dad558bda3 100644 --- a/docs/en/docs/faq/data-faq.md +++ b/docs/en/docs/faq/data-faq.md @@ -169,9 +169,14 @@ FROM kafka "property.group.id" = "xxx" ); ``` + ### Q12. ERROR 1105 (HY000): errCode = 2, detailMessage = (192.168.90.91)[CANCELLED][INTERNAL_ERROR]error setting certificate verify locations: CAfile: /etc/ssl/certs/ca-certificates.crt CApath: none ``` yum install -y ca-certificates ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-certificates.crt ``` + +### Q13. create partition failed. partition numbers will exceed limit variable max_auto_partition_num + +To prevent accidental creation of too many partitions when importing data for auto-partitioned tables, we use the FE configuration item `max_auto_partition_num` to control the maximum number of partitions to be created automatically for such tables. If you really need to create more partitions, please modify this config item of FE Master node. diff --git a/docs/zh-CN/docs/admin-manual/config/be-config.md b/docs/zh-CN/docs/admin-manual/config/be-config.md index 058b66ef710908..b4e7c887a83cba 100644 --- a/docs/zh-CN/docs/admin-manual/config/be-config.md +++ b/docs/zh-CN/docs/admin-manual/config/be-config.md @@ -849,6 +849,16 @@ BaseCompaction:546859: * 默认值: 100 * 可动态修改:是 +#### `olap_table_sink_send_interval_microseconds`. + +* 描述: 数据导入时,Coordinator 的 sink 节点有一个轮询线程持续向对应BE发送数据。该线程将每隔 `olap_table_sink_send_interval_microseconds` 微秒检查是否有数据要发送。 +* 默认值:1000 + +#### `olap_table_sink_send_interval_auto_partition_factor`. + +* 描述: 如果我们向一个启用了自动分区的表导入数据,那么 `olap_table_sink_send_interval_microseconds` 的时间间隔就会太慢。在这种情况下,实际间隔将乘以该系数。 +* 默认值:0.001 + ### 线程 #### `delete_worker_count` @@ -1524,13 +1534,3 @@ load tablets from header failed, failed tablets size: xxx, path=xxx * 描述: BE 是否开启使用java-jni,开启后允许 c++ 与 java 之间的相互调用。目前已经支持hudi、java-udf、jdbc、max-compute、paimon、preload、avro * 默认值: true - -#### `olap_table_sink_send_interval_microseconds`. - -* 描述: 数据导入时,Coordinator 的 sink 节点有一个轮询线程持续向对应BE发送数据。该线程将每隔 `olap_table_sink_send_interval_microseconds` 微秒检查是否有数据要发送。 -* 默认值:1000 - -#### `olap_table_sink_send_interval_auto_partition_factor`. - -* 描述: 如果我们向一个启用了自动分区的表导入数据,那么 `olap_table_sink_send_interval_microseconds` 的时间间隔就会太慢。在这种情况下,实际间隔将乘以该系数。 -* 默认值:0.001 diff --git a/docs/zh-CN/docs/advanced/partition/auto-partition.md b/docs/zh-CN/docs/advanced/partition/auto-partition.md index ef0da5032dede8..185b1f3b9e5509 100644 --- a/docs/zh-CN/docs/advanced/partition/auto-partition.md +++ b/docs/zh-CN/docs/advanced/partition/auto-partition.md @@ -32,9 +32,58 @@ under the License. 自动分区功能支持了在导入数据过程中自动检测是否存在对应所属分区。如果不存在,则会自动创建分区并正常进行导入。 +## 使用场景 + +自动分区功能主要解决了用户预期基于某列对表进行分区操作,但该列的数据分布比较零散或者难以预测,在建表或调整表结构时难以准确创建所需分区,或者分区数量过多以至于手动创建过于繁琐的问题。 + +以时间类型分区列为例,在[动态分区](./dynamic-partition)功能中,我们支持了按特定时间周期自动创建新分区以容纳实时数据。对于实时的用户行为日志等场景该功能基本能够满足需求。但在一些更复杂的场景下,例如处理非实时数据时,分区列与当前系统时间无关,且包含大量离散值。此时为提高效率我们希望依据此列对数据进行分区,但数据实际可能涉及的分区无法预先掌握,或者预期所需分区数量过大。这种情况下动态分区或者手动创建分区无法满足我们的需求,自动分区功能很好地覆盖了此类需求。 + +假设我们的表DDL如下: + +```sql +CREATE TABLE `DAILY_TRADE_VALUE` +( + `TRADE_DATE` datev2 NULL COMMENT '交易日期', + `TRADE_ID` varchar(40) NULL COMMENT '交易编号', + ...... +) +UNIQUE KEY(`TRADE_DATE`, `TRADE_ID`) +PARTITION BY RANGE(`TRADE_DATE`) +( + PARTITION p_2000 VALUES [('2000-01-01'), ('2001-01-01')), + PARTITION p_2001 VALUES [('2001-01-01'), ('2002-01-01')), + PARTITION p_2002 VALUES [('2002-01-01'), ('2003-01-01')), + PARTITION p_2003 VALUES [('2003-01-01'), ('2004-01-01')), + PARTITION p_2004 VALUES [('2004-01-01'), ('2005-01-01')), + PARTITION p_2005 VALUES [('2005-01-01'), ('2006-01-01')), + PARTITION p_2006 VALUES [('2006-01-01'), ('2007-01-01')), + PARTITION p_2007 VALUES [('2007-01-01'), ('2008-01-01')), + PARTITION p_2008 VALUES [('2008-01-01'), ('2009-01-01')), + PARTITION p_2009 VALUES [('2009-01-01'), ('2010-01-01')), + PARTITION p_2010 VALUES [('2010-01-01'), ('2011-01-01')), + PARTITION p_2011 VALUES [('2011-01-01'), ('2012-01-01')), + PARTITION p_2012 VALUES [('2012-01-01'), ('2013-01-01')), + PARTITION p_2013 VALUES [('2013-01-01'), ('2014-01-01')), + PARTITION p_2014 VALUES [('2014-01-01'), ('2015-01-01')), + PARTITION p_2015 VALUES [('2015-01-01'), ('2016-01-01')), + PARTITION p_2016 VALUES [('2016-01-01'), ('2017-01-01')), + PARTITION p_2017 VALUES [('2017-01-01'), ('2018-01-01')), + PARTITION p_2018 VALUES [('2018-01-01'), ('2019-01-01')), + PARTITION p_2019 VALUES [('2019-01-01'), ('2020-01-01')), + PARTITION p_2020 VALUES [('2020-01-01'), ('2021-01-01')), + PARTITION p_2021 VALUES [('2021-01-01'), ('2022-01-01')) +) +DISTRIBUTED BY HASH(`TRADE_DATE`) BUCKETS 10 +PROPERTIES ( + "replication_num" = "1" +); +``` + +该表内存储了大量业务历史数据,依据交易发生的日期进行分区。可以看到在建表时,我们需要预先手动创建分区。如果分区列的数据范围发生变化,例如上表中增加了2022年的数据,则我们需要通过[ALTER-TABLE-PARTITION](../../sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION)对表的分区进行更改。如果这种分区需要变更,或者进行更细粒度的细分,修改起来非常繁琐。此时我们就可以使用AUTO PARTITION改写该表DDL。 + ## 语法 -建表时,使用以下语法填充[CREATE-TABLE](../../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md)时的`partition_info`部分: +建表时,使用以下语法填充[CREATE-TABLE](../../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE)时的`partition_info`部分: 1. AUTO RANGE PARTITION: @@ -94,55 +143,12 @@ under the License. 1. 当前自动分区功能仅支持一个分区列; 2. 在AUTO RANGE PARTITION中,分区函数仅支持`date_trunc`,分区列仅支持`DATEV2`或者`DATETIMEV2`格式; -3. 在AUTO LIST PARTITION中,不支持函数调用,分区列支持 BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DATE, DATETIME, CHAR, VARCHAR 数据类型,分区值为枚举值。 +3. 在AUTO LIST PARTITION中,不支持函数调用,分区列支持 `BOOLEAN`, `TINYINT`, `SMALLINT`, `INT`, `BIGINT`, `LARGEINT`, `DATE`, `DATETIME`, `CHAR`, `VARCHAR` 数据类型,分区值为枚举值。 4. 在AUTO LIST PARTITION中,分区列的每个当前不存在对应分区的取值,都会创建一个独立的新PARTITION。 ## 场景示例 -在[动态分区](./dynamic-partition.md)功能中,我们支持了按特定时间周期自动创建新分区以容纳实时数据。但在更复杂的场景下,例如处理非实时数据时,分区列与当前系统时间无关。此时如果需要进行数据分区操作,则需要用户手动整理所属分区并在数据导入前进行创建。在分区列基数较大时比较繁琐。自动分区功能解决了这一问题。 - -例如,我们有一张表如下: - -```sql -CREATE TABLE `DAILY_TRADE_VALUE` -( - `TRADE_DATE` datev2 NULL COMMENT '交易日期', - `TRADE_ID` varchar(40) NULL COMMENT '交易编号', - ...... -) -UNIQUE KEY(`TRADE_DATE`, `TRADE_ID`) -PARTITION BY RANGE(`TRADE_DATE`) -( - PARTITION p_2000 VALUES [('2000-01-01'), ('2001-01-01')), - PARTITION p_2001 VALUES [('2001-01-01'), ('2002-01-01')), - PARTITION p_2002 VALUES [('2002-01-01'), ('2003-01-01')), - PARTITION p_2003 VALUES [('2003-01-01'), ('2004-01-01')), - PARTITION p_2004 VALUES [('2004-01-01'), ('2005-01-01')), - PARTITION p_2005 VALUES [('2005-01-01'), ('2006-01-01')), - PARTITION p_2006 VALUES [('2006-01-01'), ('2007-01-01')), - PARTITION p_2007 VALUES [('2007-01-01'), ('2008-01-01')), - PARTITION p_2008 VALUES [('2008-01-01'), ('2009-01-01')), - PARTITION p_2009 VALUES [('2009-01-01'), ('2010-01-01')), - PARTITION p_2010 VALUES [('2010-01-01'), ('2011-01-01')), - PARTITION p_2011 VALUES [('2011-01-01'), ('2012-01-01')), - PARTITION p_2012 VALUES [('2012-01-01'), ('2013-01-01')), - PARTITION p_2013 VALUES [('2013-01-01'), ('2014-01-01')), - PARTITION p_2014 VALUES [('2014-01-01'), ('2015-01-01')), - PARTITION p_2015 VALUES [('2015-01-01'), ('2016-01-01')), - PARTITION p_2016 VALUES [('2016-01-01'), ('2017-01-01')), - PARTITION p_2017 VALUES [('2017-01-01'), ('2018-01-01')), - PARTITION p_2018 VALUES [('2018-01-01'), ('2019-01-01')), - PARTITION p_2019 VALUES [('2019-01-01'), ('2020-01-01')), - PARTITION p_2020 VALUES [('2020-01-01'), ('2021-01-01')), - PARTITION p_2021 VALUES [('2021-01-01'), ('2022-01-01')) -) -DISTRIBUTED BY HASH(`TRADE_DATE`) BUCKETS 10 -PROPERTIES ( - "replication_num" = "1" -); -``` - -该表内存储了大量业务历史数据,依据交易发生的日期进行分区。可以看到在建表时,我们需要预先手动创建分区。如果分区列的数据范围发生变化,例如上表中增加了2022年的数据,则我们需要通过[ALTER-TABLE-PARTITION](../../sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION.md)对表的分区进行更改。在使用AUTO PARTITION后,该表DDL可以改为: +在使用场景一节中的示例,在使用AUTO PARTITION后,该表DDL可以改写为: ```sql CREATE TABLE `DAILY_TRADE_VALUE` @@ -171,7 +177,6 @@ Empty set (0.12 sec) ```sql mysql> insert into `DAILY_TRADE_VALUE` values ('2012-12-13', 1), ('2008-02-03', 2), ('2014-11-11', 3); Query OK, 3 rows affected (0.88 sec) -{'label':'insert_754e2a3926a345ea_854793fb2638f0ec', 'status':'VISIBLE', 'txnId':'20014'} mysql> show partitions from `DAILY_TRADE_VALUE`; +-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+ @@ -184,8 +189,11 @@ mysql> show partitions from `DAILY_TRADE_VALUE`; 3 rows in set (0.12 sec) ``` +经过自动分区功能所创建的PARTITION,与手动创建的PARTITION具有完全一致的功能性质。 + ## 注意事项 -- 在数据的插入或导入过程中如果创建了分区,而最终整个过程失败,被创建的分区不会被自动删除。 +- 在数据的插入或导入过程中如果创建了分区,而整个导入过程没有完成(失败或被取消),被创建的分区不会被自动删除。 - 使用AUTO PARTITION的表,只是分区创建方式上由手动转为了自动。表及其所创建分区的原本使用方法都与非AUTO PARTITION的表或分区相同。 -- 向开启了AUTO PARTITION的表导入数据时,Coordinator发送数据的轮询间隔与普通表有所不同。具体请见[BE配置项](../../admin-manual/config/be-config.md)中的`olap_table_sink_send_interval_auto_partition_factor`。 +- 为防止意外创建过多分区,我们通过[FE配置项](../../admin-manual/config/fe-config)中的`max_auto_partition_num`控制了一个AUTO PARTITION表最大容纳分区数。如有需要可以调整该值 +- 向开启了AUTO PARTITION的表导入数据时,Coordinator发送数据的轮询间隔与普通表有所不同。具体请见[BE配置项](../../admin-manual/config/be-config)中的`olap_table_sink_send_interval_auto_partition_factor`。 diff --git a/docs/zh-CN/docs/faq/data-faq.md b/docs/zh-CN/docs/faq/data-faq.md index a7051f161b42d1..908f64d47c1155 100644 --- a/docs/zh-CN/docs/faq/data-faq.md +++ b/docs/zh-CN/docs/faq/data-faq.md @@ -167,8 +167,14 @@ FROM kafka "property.group.id" = "xxx" ); ``` + ### Q12. ERROR 1105 (HY000): errCode = 2, detailMessage = (192.168.90.91)[CANCELLED][INTERNAL_ERROR]error setting certificate verify locations: CAfile: /etc/ssl/certs/ca-certificates.crt CApath: none + ``` yum install -y ca-certificates ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-certificates.crt -``` \ No newline at end of file +``` + +### Q13. create partition failed. partition numbers will exceed limit variable max_auto_partition_num + +对自动分区表导入数据时,为防止意外创建过多分区,我们使用了FE配置项`max_auto_partition_num`管控此类表自动创建时的最大分区数。如果确需创建更多分区,请修改FE master节点的该配置项。 diff --git a/fe/fe-core/src/main/java/org/apache/doris/service/FrontendServiceImpl.java b/fe/fe-core/src/main/java/org/apache/doris/service/FrontendServiceImpl.java index 66751d6a9123ca..46a5726113842d 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/service/FrontendServiceImpl.java +++ b/fe/fe-core/src/main/java/org/apache/doris/service/FrontendServiceImpl.java @@ -3177,7 +3177,8 @@ public TCreatePartitionResult createPartition(TCreatePartitionRequest request) t if (partitionNum > Config.max_auto_partition_num) { olapTable.writeUnlock(); String errorMessage = String.format( - "create partition failed. partition numbers %d will exceed limit variable max_auto_partition_num%d", + "create partition failed. partition numbers %d will exceed limit variable " + + "max_auto_partition_num %d", partitionNum, Config.max_auto_partition_num); LOG.warn(errorMessage); errorStatus.setErrorMsgs(Lists.newArrayList(errorMessage));