Skip to content

Commit

Permalink
[doc](mtmv)partition limit (apache#485)
Browse files Browse the repository at this point in the history
  • Loading branch information
zddr authored Apr 2, 2024
1 parent f2ad196 commit 45ee426
Show file tree
Hide file tree
Showing 4 changed files with 93 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,43 @@ The materialized view has a dedicated deletion syntax and cannot be deleted thro

Specific syntax can be viewed [DROP ASYNC MATERIALIZED VIEW](../../sql-manual/sql-reference/Data-Definition-Statements/Drop/DROP-ASYNC-MATERIALIZED-VIEW.md)

## Best Practice
### When there are excessive partitions in the base table, the materialized view should only focus on the data from the most recent period.
create table with 3 partitions
```sql
CREATE TABLE t1 (
`k1` INT,
`k2` DATE NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(`k1`)
COMMENT 'OLAP'
PARTITION BY range(`k2`)
(
PARTITION p26 VALUES [("2024-03-26"),("2024-03-27")),
PARTITION p27 VALUES [("2024-03-27"),("2024-03-28")),
PARTITION p28 VALUES [("2024-03-28"),("2024-03-29"))
)
DISTRIBUTED BY HASH(`k1`) BUCKETS 2
PROPERTIES (
'replication_num' = '1'
);
```
To create a materialized view that focuses only on the data from the most recent day, assuming the current time is 2024-03-28 xx:xx:xx, the materialized view would include only the partition ranging from ["2024-03-28") to ("2024-03-29"). This ensures that the materialized view contains data only for the latest day.
```sql
CREATE MATERIALIZED VIEW mv1
BUILD DEFERRED REFRESH AUTO ON MANUAL
partition by(`k2`)
DISTRIBUTED BY RANDOM BUCKETS 2
PROPERTIES (
'replication_num' = '1',
'partition_sync_limit'='1',
'partition_sync_time_unit'='DAY'
)
AS
SELECT * FROM t1;
```
As another day passes and the current time becomes 2024-03-29 xx:xx:xx, a new partition is added to t1, ranging from ["2024-03-29") to ("2024-03-30"). If the materialized view is refreshed at this point, upon completion of the refresh, the materialized view will contain only this new partition, ranging from ["2024-03-29") to ("2024-03-30").

## The use of materialized views

can be viewed [Query async materialized view](./query-async-materialized-view.md)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,16 @@ The properties unique to materialized views include:

`workload_group`: The name of the workload_group used by the materialized view when performing refresh tasks. This is used to limit the resources used for refreshing data in the materialized view, in order to avoid affecting the operation of other business processes. For details on how to create and use workload_group, refer to [WORKLOAD-GROUP](../../../../admin-manual/workload-group.md)

`partition_sync_limit`: When the partition field of the base table is time (and if it's a string-type time, `partition_date_format` can be set), this attribute can be used to configure the partition range for synchronizing the base table. It works in conjunction with `partition_sync_time_unit`.

For instance, if it's set to 2 and `partition_sync_time_unit` is set to `MONTH`, it means only the partitions and data of the last 2 months in the base table will be synchronized. The minimum value is 1.

As time passes, the materialized view will automatically add and delete partitions during each refresh. For example, if the materialized view currently has data for February and March, next month it will automatically delete the data for February and add data for April.

`partition_sync_time_unit`: time unit, support DAY/MONTH/YEAR(default: DAY)

`partition_date_format`: date format of partition column,for example: "%Y-%m-%d"

##### query

Create a query statement for the materialized view, and the result is the data in the materialized view
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,44 @@ DROP MATERIALIZED VIEW mv1;

具体的语法可查看[DROP ASYNC MATERIALIZED VIEW](../../sql-manual/sql-reference/Data-Definition-Statements/Drop/DROP-ASYNC-MATERIALIZED-VIEW.md)

## 最佳实践
### 基表分区过多,物化视图只关注最近一段时间的数据
创建基表,有三个分区
```sql
CREATE TABLE t1 (
`k1` INT,
`k2` DATE NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(`k1`)
COMMENT 'OLAP'
PARTITION BY range(`k2`)
(
PARTITION p26 VALUES [("2024-03-26"),("2024-03-27")),
PARTITION p27 VALUES [("2024-03-27"),("2024-03-28")),
PARTITION p28 VALUES [("2024-03-28"),("2024-03-29"))
)
DISTRIBUTED BY HASH(`k1`) BUCKETS 2
PROPERTIES (
'replication_num' = '1'
);
```
创建物化视图,只关注最近一天的数据,如果当前时间为2024-03-28 xx:xx:xx, 这样物化视图会仅有一个分区[("2024-03-28"),("2024-03-29"))
```sql
CREATE MATERIALIZED VIEW mv1
BUILD DEFERRED REFRESH AUTO ON MANUAL
partition by(`k2`)
DISTRIBUTED BY RANDOM BUCKETS 2
PROPERTIES (
'replication_num' = '1',
'partition_sync_limit'='1',
'partition_sync_time_unit'='DAY'
)
AS
SELECT * FROM t1;
```
时间又过了一天,当前时间为2024-03-29 xx:xx:xx`t1`新增一个分区[("2024-03-29"),("2024-03-30")),如果此时刷新物化视图,刷新完成后,物化视图会仅有一个分区[("2024-03-29"),("2024-03-30"))


## 物化视图的使用

请参阅 [查询异步物化视图](./query-async-materialized-view.md)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,14 @@ KEY(k1,k2)

`workload_group`:物化视图执行刷新任务时使用的workload_group名称。用来限制物化视图刷新数据使用的资源,避免影响到其它业务的运行。workload_group创建及使用 [WORKLOAD-GROUP](../../../../admin-manual/workload-group.md)

`partition_sync_limit`:当基表的分区字段为时间时(如果是字符串类型的时间,可以设置partition_date_format),可以用此属性配置同步基表的分区范围,配合`partition_sync_time_unit`一起使用。
例如设置为 2,`partition_sync_time_unit`设置为 MONTH,代表仅同步基表近2个月的分区和数据。最小值为1。
随着时间的变化物化视图每次刷新时都会自动增删分区,例如物化视图现在有2,3两个月的数据,下个月的时候,会自动删除2月的数据,增加4月的数据。

`partition_sync_time_unit`:时间单位,支持 DAY/MONTH/YEAR(默认DAY)

`partition_date_format`:分区字段的时间格式,例如"%Y-%m-%d"

##### query

创建物化视图的查询语句,其结果即为物化视图中的数据
Expand Down

0 comments on commit 45ee426

Please sign in to comment.