Skip to content

Commit

Permalink
Merge branch 'master' into kandy-3763
Browse files Browse the repository at this point in the history
  • Loading branch information
hadoopkandy authored Aug 7, 2024
2 parents ae055d5 + c11b950 commit 3223962
Show file tree
Hide file tree
Showing 333 changed files with 9,759 additions and 2,477 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/e2e-tests-1.19-jdk11.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,6 @@ jobs:
. .github/workflows/utils.sh
jvm_timezone=$(random_timezone)
echo "JVM timezone is set to $jvm_timezone"
mvn -T 1C -B test -pl paimon-e2e-tests -Duser.timezone=$jvm_timezone
mvn -T 1C -B test -pl paimon-e2e-tests -Duser.timezone=$jvm_timezone -Pflink-1.19
env:
MAVEN_OPTS: -Xmx4096m
2 changes: 1 addition & 1 deletion .github/workflows/e2e-tests-1.19.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,6 @@ jobs:
. .github/workflows/utils.sh
jvm_timezone=$(random_timezone)
echo "JVM timezone is set to $jvm_timezone"
mvn -T 1C -B test -pl paimon-e2e-tests -Duser.timezone=$jvm_timezone
mvn -T 1C -B test -pl paimon-e2e-tests -Duser.timezone=$jvm_timezone -Pflink-1.19
env:
MAVEN_OPTS: -Xmx4096m
58 changes: 58 additions & 0 deletions .github/workflows/e2e-tests-1.20-jdk11.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
################################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################

name: End to End Tests Flink 1.20 on JDK 11

on:
issue_comment:
types: [created, edited, deleted]

# daily run
schedule:
- cron: "0 0 * * *"

env:
JDK_VERSION: 11

jobs:
build:
if: |
github.event_name == 'schedule' ||
(contains(github.event.comment.html_url, '/pull/') && contains(github.event.comment.body, '/jdk11'))
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Set up JDK ${{ env.JDK_VERSION }}
uses: actions/setup-java@v2
with:
java-version: ${{ env.JDK_VERSION }}
distribution: 'adopt'
- name: Build Flink 1.20
run: mvn -T 1C -B clean install -DskipTests
- name: Test Flink 1.20
timeout-minutes: 60
run: |
# run tests with random timezone to find out timezone related bugs
. .github/workflows/utils.sh
jvm_timezone=$(random_timezone)
echo "JVM timezone is set to $jvm_timezone"
mvn -T 1C -B test -pl paimon-e2e-tests -Duser.timezone=$jvm_timezone
env:
MAVEN_OPTS: -Xmx4096m
57 changes: 57 additions & 0 deletions .github/workflows/e2e-tests-1.20.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
################################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################

name: End to End Tests Flink 1.20

on:
push:
pull_request:
paths-ignore:
- 'docs/**'
- '**/*.md'

env:
JDK_VERSION: 8

concurrency:
group: ${{ github.workflow }}-${{ github.event_name }}-${{ github.event.number || github.run_id }}
cancel-in-progress: true

jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Set up JDK ${{ env.JDK_VERSION }}
uses: actions/setup-java@v2
with:
java-version: ${{ env.JDK_VERSION }}
distribution: 'adopt'
- name: Build Flink 1.20
run: mvn -T 1C -B clean install -DskipTests
- name: Test Flink 1.20
timeout-minutes: 60
run: |
# run tests with random timezone to find out timezone related bugs
. .github/workflows/utils.sh
jvm_timezone=$(random_timezone)
echo "JVM timezone is set to $jvm_timezone"
mvn -T 1C -B test -pl paimon-e2e-tests -Duser.timezone=$jvm_timezone
env:
MAVEN_OPTS: -Xmx4096m
2 changes: 1 addition & 1 deletion .github/workflows/unitcase-flink-jdk11.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ jobs:
jvm_timezone=$(random_timezone)
echo "JVM timezone is set to $jvm_timezone"
test_modules=""
for suffix in 1.15 1.16 1.17 1.18 1.19 common; do
for suffix in 1.15 1.16 1.17 1.18 1.19 1.20 common; do
test_modules+="org.apache.paimon:paimon-flink-${suffix},"
done
test_modules="${test_modules%,}"
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/utitcase-flink.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ jobs:
jvm_timezone=$(random_timezone)
echo "JVM timezone is set to $jvm_timezone"
test_modules=""
for suffix in 1.15 1.16 1.17 1.18 1.19 common; do
for suffix in 1.15 1.16 1.17 1.18 1.19 1.20 common; do
test_modules+="org.apache.paimon:paimon-flink-${suffix},"
done
test_modules="${test_modules%,}"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
![Paimon](https://paimon.apache.org/assets/paimon_blue.svg)
![Paimon](https://github.com/apache/paimon/blob/master/docs/static/paimon-simple.png)

[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)
[![Get on Slack](https://img.shields.io/badge/slack-join-orange.svg)](https://the-asf.slack.com/archives/C053Q2NCW8G)
Expand Down
6 changes: 4 additions & 2 deletions docs/content/append-table/query.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,7 @@ be stored directly in the manifest, otherwise in the directory of the data file.
which has a separate file definition and can contain different types of indexes with multiple columns.

Different file index may be efficient in different scenario. For example bloom filter may speed up query in point lookup
scenario. Using a bitmap may consume more space but can result in greater accuracy. Though we only realize bloom filter
currently, but other types of index will be supported in the future.
scenario. Using a bitmap may consume more space but can result in greater accuracy.

Currently, file index is only supported in append-only table.

Expand All @@ -69,6 +68,9 @@ Currently, file index is only supported in append-only table.
* `file-index.bloom-filter.<column_name>.fpp` to config false positive probability.
* `file-index.bloom-filter.<column_name>.items` to config the expected distinct items in one data file.

`Bitmap`:
* `file-index.bitmap.columns`: specify the columns that need bitmap index.

More filter types will be supported...

If you want to add file index to existing table, without any rewrite, you can use `rewrite_file_index` procedure. Before
Expand Down
4 changes: 2 additions & 2 deletions docs/content/engines/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,13 @@ under the License.

| Engine | Version | Batch Read | Batch Write | Create Table | Alter Table | Streaming Write | Streaming Read | Batch Overwrite | DELETE & UPDATE | MERGE INTO | Time Travel |
|:-------------------------------------------------------------------------------:|:-------------:|:-----------:|:-----------:|:-------------:|:-------------:|:----------------:|:----------------:|:---------------:|:------------------:|:-----------:|:-----------:|
| Flink | 1.15 - 1.19 |||| ✅(1.17+) |||| ✅(1.17+) |||
| Flink | 1.15 - 1.20 |||| ✅(1.17+) |||| ✅(1.17+) |||
| Spark | 3.1 - 3.5 || ✅(3.2+) ||| ✅(3.3+) | ✅(3.3+) | ✅(3.2+) | ✅(3.2+) | ✅(3.2+) | ✅(3.3+) |
| Hive | 2.1 - 3.1 |||||||||||
| Trino | 420 - 439 || ✅(427+) | ✅(427+) | ✅(427+) |||||||
| Presto | 0.236 - 0.280 |||||||||||
| [StarRocks](https://docs.starrocks.io/docs/data_source/catalog/paimon_catalog/) | 3.1+ |||||||||||
| [Doris](https://doris.apache.org/docs/lakehouse/datalake-analytics/paimon) | 2.0.6+ |||||||||||
| [Doris](https://doris.apache.org/docs/lakehouse/datalake-analytics/paimon) | 2.0.6+ |||||||||||

## Streaming Engines

Expand Down
6 changes: 4 additions & 2 deletions docs/content/flink/procedures.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,13 +258,15 @@ All available procedures are listed below.
<li>table: the target table identifier. Cannot be empty.</li>
<li>expiration_time: the expiration interval of a partition. A partition will be expired if it‘s lifetime is over this value. Partition time is extracted from the partition value.</li>
<li>timestamp_formatter: the formatter to format timestamp from string.</li>
<li>timestamp_pattern: the pattern to get a timestamp from partitions.</li>
<li>expire_strategy: specifies the expiration strategy for partition expiration, possible values: 'values-time' or 'update-time' , 'values-time' as default.</li>
</td>
<td>
-- for Flink 1.18<br/><br/>
CALL sys.expire_partitions('default.T', '1 d', 'yyyy-MM-dd', 'values-time')<br/><br/>
CALL sys.expire_partitions('default.T', '1 d', 'yyyy-MM-dd', '$dt', 'values-time')<br/><br/>
-- for Flink 1.19 and later<br/><br/>
CALL sys.expire_partitions(`table` => 'default.T', expiration_time => '1 d', timestamp_formatter => 'yyyy-MM-dd', expire_strategy => 'values-time')<br/><br/>
CALL sys.expire_partitions(`table` => 'default.T', expiration_time => '1 d', timestamp_formatter => 'yyyy-MM-dd', expire_strategy => 'values-time')<br/>
CALL sys.expire_partitions(`table` => 'default.T', expiration_time => '1 d', timestamp_formatter => 'yyyy-MM-dd HH:mm', timestamp_pattern => '$dt $hm', expire_strategy => 'values-time')<br/><br/>
</td>
</tr>
<tr>
Expand Down
4 changes: 3 additions & 1 deletion docs/content/flink/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ This documentation is a guide for using Paimon in Flink.

## Jars

Paimon currently supports Flink 1.19, 1.18, 1.17, 1.16, 1.15. We recommend the latest Flink version for a better experience.
Paimon currently supports Flink 1.20, 1.19, 1.18, 1.17, 1.16, 1.15. We recommend the latest Flink version for a better experience.

Download the jar file with corresponding version.

Expand All @@ -39,6 +39,7 @@ Download the jar file with corresponding version.
| Version | Type | Jar |
|--------------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Flink 1.20 | Bundled Jar | [paimon-flink-1.20-{{< version >}}.jar](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-flink-1.20/{{< version >}}/paimon-flink-1.20-{{< version >}}.jar) |
| Flink 1.19 | Bundled Jar | [paimon-flink-1.19-{{< version >}}.jar](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-flink-1.19/{{< version >}}/paimon-flink-1.19-{{< version >}}.jar) |
| Flink 1.18 | Bundled Jar | [paimon-flink-1.18-{{< version >}}.jar](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-flink-1.18/{{< version >}}/paimon-flink-1.18-{{< version >}}.jar) |
| Flink 1.17 | Bundled Jar | [paimon-flink-1.17-{{< version >}}.jar](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-flink-1.17/{{< version >}}/paimon-flink-1.17-{{< version >}}.jar) |
Expand All @@ -52,6 +53,7 @@ Download the jar file with corresponding version.

| Version | Type | Jar |
|--------------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------|
| Flink 1.20 | Bundled Jar | [paimon-flink-1.20-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-flink-1.20/{{< version >}}/) |
| Flink 1.19 | Bundled Jar | [paimon-flink-1.19-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-flink-1.19/{{< version >}}/) |
| Flink 1.18 | Bundled Jar | [paimon-flink-1.18-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-flink-1.18/{{< version >}}/) |
| Flink 1.17 | Bundled Jar | [paimon-flink-1.17-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-flink-1.17/{{< version >}}/) |
Expand Down
63 changes: 59 additions & 4 deletions docs/content/maintenance/dedicated-compaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,7 @@ To run a dedicated job for compaction, follow these instructions.

{{< tabs "dedicated-compaction-job" >}}

{{< tab "Flink" >}}

Flink SQL currently does not support statements related to compactions, so we have to submit the compaction job through `flink run`.
{{< tab "Flink Action Jar" >}}

Run the following command to submit a compaction job for the table.

Expand Down Expand Up @@ -130,6 +128,25 @@ For more usage of the compact action, see
{{< /tab >}}
{{< tab "Flink" >}}
Run the following sql:
```sql
-- compact table
CALL sys.compact(`table` => 'default.T');

-- compact table with options
CALL sys.compact(`table` => 'default.T', `options` => 'sink.parallelism=4');

-- compact table partition
CALL sys.compact(`table` => 'default.T', `partitions` => 'p=0');

-- compact table partition with filter
CALL sys.compact(`table` => 'default.T', `where` => 'dt>10 and h<20');
```
{{< /tab >}}
{{< /tabs >}}
{{< hint info >}}
Expand All @@ -143,7 +160,7 @@ You can run the following command to submit a compaction job for multiple databa
{{< tabs "database-compaction-job" >}}
{{< tab "Flink" >}}
{{< tab "Flink Action Jar" >}}
```bash
<FLINK_HOME>/bin/flink run \
Expand Down Expand Up @@ -226,6 +243,26 @@ For more usage of the compact_database action, see
{{< /tab >}}
{{< tab "Flink" >}}
Run the following sql:
```sql
CALL sys.compact_database('includingDatabases')
CALL sys.compact_database('includingDatabases', 'mode')
CALL sys.compact_database('includingDatabases', 'mode', 'includingTables')
CALL sys.compact_database('includingDatabases', 'mode', 'includingTables', 'excludingTables')
CALL sys.compact_database('includingDatabases', 'mode', 'includingTables', 'excludingTables', 'tableOptions')
-- example
CALL sys.compact_database('db1|db2', 'combined', 'table_.*', 'ignore', 'sink.parallelism=4')
```
{{< /tab >}}
{{< /tabs >}}
## Sort Compact
Expand All @@ -234,6 +271,10 @@ If your table is configured with [dynamic bucket primary key table]({{< ref "pri
or [append table]({{< ref "append-table/overview" >}}) ,
you can trigger a compact with specified column sort to speed up queries.
{{< tabs "sort-compaction-job" >}}
{{< tab "Flink Action Jar" >}}
```bash
<FLINK_HOME>/bin/flink run \
-D execution.runtime-mode=batch \
Expand All @@ -253,3 +294,17 @@ There are two new configuration in `Sort Compact`
The sort parallelism is the same as the sink parallelism, you can dynamically specify it by add conf `--table_conf sink.parallelism=<value>`.
{{< /tab >}}
{{< tab "Flink" >}}
Run the following sql:
```sql
-- sort compact table
CALL sys.compact(`table` => 'default.T', order_strategy => 'zorder', order_by => 'a,b')
```
{{< /tab >}}
{{< /tabs >}}
Loading

0 comments on commit 3223962

Please sign in to comment.