Skip to content

Commit

Permalink
Merge branch 'master' into rest-catalog
Browse files Browse the repository at this point in the history
  • Loading branch information
jerry-024 authored Jan 9, 2025
2 parents 10159cf + 13ed895 commit c56388c
Show file tree
Hide file tree
Showing 208 changed files with 4,358 additions and 1,567 deletions.
2 changes: 1 addition & 1 deletion NOTICE
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Apache Paimon
Copyright 2023-2024 The Apache Software Foundation
Copyright 2023-2025 The Apache Software Foundation

This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).
Expand Down
9 changes: 5 additions & 4 deletions docs/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,11 @@ pygmentsUseClasses = true
# we change the version for the complete docs when forking of a release branch
# etc.
# The full version string as referenced in Maven (e.g. 1.2.1)
Version = "1.0-SNAPSHOT"
Version = "1.1-SNAPSHOT"

# For stable releases, leave the bugfix version out (e.g. 1.2). For snapshot
# release this should be the same as the regular version
VersionTitle = "1.0-SNAPSHOT"
VersionTitle = "1.1-SNAPSHOT"

# The branch for this version of Apache Paimon
Branch = "master"
Expand Down Expand Up @@ -67,11 +67,12 @@ pygmentsUseClasses = true
["JavaDocs", "//paimon.apache.org/docs/master/api/java/"],
]

StableDocs = "https://paimon.apache.org/docs/0.9"
StableDocs = "https://paimon.apache.org/docs/1.0"

PreviousDocs = [
["master", "https://paimon.apache.org/docs/master"],
["stable", "https://paimon.apache.org/docs/0.9"],
["stable", "https://paimon.apache.org/docs/1.0"],
["1.0", "https://paimon.apache.org/docs/1.0"],
["0.9", "https://paimon.apache.org/docs/0.9"],
["0.8", "https://paimon.apache.org/docs/0.8"],
]
Expand Down
25 changes: 14 additions & 11 deletions docs/content/cdc-ingestion/mysql-cdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,15 @@ Download `CDC Bundled Jar` and put them under <FLINK_HOME>/lib/.

| Version | Bundled Jar |
|---------|-------------------------------------------------------------------------------------------------------------------------|
| 2.3.x | <br/>Only supported in versions below 0.8.2<li> flink-sql-connector-mysql-cdc-2.3.x.jar |
| 2.4.x | <br/>Only supported in versions below 0.8.2<li> flink-sql-connector-mysql-cdc-2.4.x.jar |
| 3.0.x | <br/>Only supported in versions below 0.8.2<li> flink-sql-connector-mysql-cdc-3.0.x.jar <li> flink-cdc-common-3.0.x.jar |
| 3.1.x | <li> flink-sql-connector-mysql-cdc-3.1.x.jar <li> mysql-connector-java-8.0.27.jar |

{{< hint danger >}}
Only cdc 3.1+ is supported.

You can download the `flink-connector-mysql-cdc` jar package by clicking [here](https://repo.maven.apache.org/maven2/org/apache/flink/flink-connector-mysql-cdc/).

{{< /hint >}}

## Synchronizing Tables

By using [MySqlSyncTableAction](/docs/{{< param Branch >}}/api/java/org/apache/paimon/flink/action/cdc/mysql/MySqlSyncTableAction) in a Flink DataStream job or directly through `flink run`, users can synchronize one or multiple tables from MySQL into one Paimon table.
Expand All @@ -48,7 +52,7 @@ To use this feature through `flink run`, run the following shell command.
```bash
<FLINK_HOME>/bin/flink run \
/path/to/paimon-flink-action-{{< version >}}.jar \
mysql_sync_table
mysql_sync_table \
--warehouse <warehouse-path> \
--database <database-name> \
--table <table-name> \
Expand All @@ -66,7 +70,7 @@ To use this feature through `flink run`, run the following shell command.
If the Paimon table you specify does not exist, this action will automatically create the table. Its schema will be derived from all specified MySQL tables. If the Paimon table already exists, its schema will be compared against the schema of all specified MySQL tables.
Example 1: synchronize tables into one Paimon table
### Example 1: synchronize tables into one Paimon table
```bash
<FLINK_HOME>/bin/flink run \
Expand All @@ -93,7 +97,7 @@ Example 1: synchronize tables into one Paimon table
As example shows, the mysql_conf's table-name supports regular expressions to monitor multiple tables that satisfy
the regular expressions. The schemas of all the tables will be merged into one Paimon table schema.
Example 2: synchronize shards into one Paimon table
### Example 2: synchronize shards into one Paimon table
You can also set 'database-name' with a regular expression to capture multiple databases. A typical scenario is that a
table 'source_table' is split into database 'source_db1', 'source_db2' ..., then you can synchronize data of all the
Expand Down Expand Up @@ -130,7 +134,7 @@ To use this feature through `flink run`, run the following shell command.
```bash
<FLINK_HOME>/bin/flink run \
/path/to/paimon-flink-action-{{< version >}}.jar \
mysql_sync_database
mysql_sync_database \
--warehouse <warehouse-path> \
--database <database-name> \
[--ignore_incompatible <true/false>] \
Expand All @@ -155,7 +159,7 @@ Only tables with primary keys will be synchronized.
For each MySQL table to be synchronized, if the corresponding Paimon table does not exist, this action will automatically create the table. Its schema will be derived from all specified MySQL tables. If the Paimon table already exists, its schema will be compared against the schema of all specified MySQL tables.
Example 1: synchronize entire database
### Example 1: synchronize entire database
```bash
<FLINK_HOME>/bin/flink run \
Expand All @@ -174,7 +178,7 @@ Example 1: synchronize entire database
--table_conf sink.parallelism=4
```
Example 2: synchronize newly added tables under database
### Example 2: synchronize newly added tables under database
Let's say at first a Flink job is synchronizing tables [product, user, address]
under database `source_db`. The command to submit the job looks like:
Expand Down Expand Up @@ -205,7 +209,6 @@ position automatically.
The command to recover from previous snapshot and add new tables to synchronize looks like:
```bash
<FLINK_HOME>/bin/flink run \
--fromSavepoint savepointPath \
Expand All @@ -227,7 +230,7 @@ The command to recover from previous snapshot and add new tables to synchronize
You can set `--mode combined` to enable synchronizing newly added tables without restarting job.
{{< /hint >}}
Example 3: synchronize and merge multiple shards
### Example 3: synchronize and merge multiple shards
Let's say you have multiple database shards `db1`, `db2`, ... and each database has tables `tbl1`, `tbl2`, .... You can
synchronize all the `db.+.tbl.+` into tables `test_db.tbl1`, `test_db.tbl2` ... by following command:
Expand Down
6 changes: 3 additions & 3 deletions docs/content/concepts/spec/fileindex.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ redundant bytes: var bytes (for compatibility with later versio
BODY: column index bytes + column index bytes + column index bytes + .......
</pre>

## Column Index Bytes: BloomFilter
## Index: BloomFilter

Define `'file-index.bloom-filter.columns'`.

Expand All @@ -94,7 +94,7 @@ Content of bloom filter index is simple:
This class use (64-bits) long hash. Store the num hash function (one integer) and bit set bytes only. Hash bytes type
(like varchar, binary, etc.) using xx hash, hash numeric type by [specified number hash](http://web.archive.org/web/20071223173210/http://www.concentric.net/~Ttwang/tech/inthash.htm).

## Column Index Bytes: Bitmap
## Index: Bitmap

Define `'file-index.bitmap.columns'`.

Expand Down Expand Up @@ -137,7 +137,7 @@ offset: 4 bytes int (when it is negative, it represents t

Integer are all BIT_ENDIAN.

## Column Index Bytes: Bit-Slice Index Bitmap
## Index: Bit-Slice Index Bitmap

BSI file index is a numeric range index, used to accelerate range query, it can use with bitmap index.

Expand Down
24 changes: 12 additions & 12 deletions docs/content/concepts/system-tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,12 +190,12 @@ SELECT * FROM my_table$files;
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
| partition | bucket | file_path | file_format | schema_id | level | record_count | file_size_in_bytes | min_key | max_key | null_value_counts | min_value_stats | max_value_stats | min_sequence_number | max_sequence_number | creation_time |
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
| [3] | 0 | data-8f64af95-29cc-4342-adc... | orc | 0 | 0 | 1 | 593 | [c] | [c] | {cnt=0, val=0, word=0} | {cnt=3, val=33, word=c} | {cnt=3, val=33, word=c} | 1691551246234 | 1691551246637 |2023-02-24T16:06:21.166|
| [2] | 0 | data-8b369068-0d37-4011-aa5... | orc | 0 | 0 | 1 | 593 | [b] | [b] | {cnt=0, val=0, word=0} | {cnt=2, val=22, word=b} | {cnt=2, val=22, word=b} | 1691551246233 | 1691551246732 |2023-02-24T16:06:21.166|
| [2] | 0 | data-83aa7973-060b-40b6-8c8... | orc | 0 | 0 | 1 | 605 | [d] | [d] | {cnt=0, val=0, word=0} | {cnt=2, val=32, word=d} | {cnt=2, val=32, word=d} | 1691551246267 | 1691551246798 |2023-02-24T16:06:21.166|
| [5] | 0 | data-3d304f4a-bcea-44dc-a13... | orc | 0 | 0 | 1 | 593 | [c] | [c] | {cnt=0, val=0, word=0} | {cnt=5, val=51, word=c} | {cnt=5, val=51, word=c} | 1691551246788 | 1691551246152 |2023-02-24T16:06:21.166|
| [1] | 0 | data-10abb5bc-0170-43ae-b6a... | orc | 0 | 0 | 1 | 595 | [a] | [a] | {cnt=0, val=0, word=0} | {cnt=1, val=11, word=a} | {cnt=1, val=11, word=a} | 1691551246722 | 1691551246273 |2023-02-24T16:06:21.166|
| [4] | 0 | data-2c9b7095-65b7-4013-a7a... | orc | 0 | 0 | 1 | 593 | [a] | [a] | {cnt=0, val=0, word=0} | {cnt=4, val=12, word=a} | {cnt=4, val=12, word=a} | 1691551246321 | 1691551246109 |2023-02-24T16:06:21.166|
| {3} | 0 | data-8f64af95-29cc-4342-adc... | orc | 0 | 0 | 1 | 593 | [c] | [c] | {cnt=0, val=0, word=0} | {cnt=3, val=33, word=c} | {cnt=3, val=33, word=c} | 1691551246234 | 1691551246637 |2023-02-24T16:06:21.166|
| {2} | 0 | data-8b369068-0d37-4011-aa5... | orc | 0 | 0 | 1 | 593 | [b] | [b] | {cnt=0, val=0, word=0} | {cnt=2, val=22, word=b} | {cnt=2, val=22, word=b} | 1691551246233 | 1691551246732 |2023-02-24T16:06:21.166|
| {2} | 0 | data-83aa7973-060b-40b6-8c8... | orc | 0 | 0 | 1 | 605 | [d] | [d] | {cnt=0, val=0, word=0} | {cnt=2, val=32, word=d} | {cnt=2, val=32, word=d} | 1691551246267 | 1691551246798 |2023-02-24T16:06:21.166|
| {5} | 0 | data-3d304f4a-bcea-44dc-a13... | orc | 0 | 0 | 1 | 593 | [c] | [c] | {cnt=0, val=0, word=0} | {cnt=5, val=51, word=c} | {cnt=5, val=51, word=c} | 1691551246788 | 1691551246152 |2023-02-24T16:06:21.166|
| {1} | 0 | data-10abb5bc-0170-43ae-b6a... | orc | 0 | 0 | 1 | 595 | [a] | [a] | {cnt=0, val=0, word=0} | {cnt=1, val=11, word=a} | {cnt=1, val=11, word=a} | 1691551246722 | 1691551246273 |2023-02-24T16:06:21.166|
| {4} | 0 | data-2c9b7095-65b7-4013-a7a... | orc | 0 | 0 | 1 | 593 | [a] | [a] | {cnt=0, val=0, word=0} | {cnt=4, val=12, word=a} | {cnt=4, val=12, word=a} | 1691551246321 | 1691551246109 |2023-02-24T16:06:21.166|
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
6 rows in set
*/
Expand All @@ -207,9 +207,9 @@ SELECT * FROM my_table$files /*+ OPTIONS('scan.snapshot-id'='1') */;
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
| partition | bucket | file_path | file_format | schema_id | level | record_count | file_size_in_bytes | min_key | max_key | null_value_counts | min_value_stats | max_value_stats | min_sequence_number | max_sequence_number | creation_time |
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
| [3] | 0 | data-8f64af95-29cc-4342-adc... | orc | 0 | 0 | 1 | 593 | [c] | [c] | {cnt=0, val=0, word=0} | {cnt=3, val=33, word=c} | {cnt=3, val=33, word=c} | 1691551246234 | 1691551246637 |2023-02-24T16:06:21.166|
| [2] | 0 | data-8b369068-0d37-4011-aa5... | orc | 0 | 0 | 1 | 593 | [b] | [b] | {cnt=0, val=0, word=0} | {cnt=2, val=22, word=b} | {cnt=2, val=22, word=b} | 1691551246233 | 1691551246732 |2023-02-24T16:06:21.166|
| [1] | 0 | data-10abb5bc-0170-43ae-b6a... | orc | 0 | 0 | 1 | 595 | [a] | [a] | {cnt=0, val=0, word=0} | {cnt=1, val=11, word=a} | {cnt=1, val=11, word=a} | 1691551246267 | 1691551246798 |2023-02-24T16:06:21.166|
| {3} | 0 | data-8f64af95-29cc-4342-adc... | orc | 0 | 0 | 1 | 593 | [c] | [c] | {cnt=0, val=0, word=0} | {cnt=3, val=33, word=c} | {cnt=3, val=33, word=c} | 1691551246234 | 1691551246637 |2023-02-24T16:06:21.166|
| {2} | 0 | data-8b369068-0d37-4011-aa5... | orc | 0 | 0 | 1 | 593 | [b] | [b] | {cnt=0, val=0, word=0} | {cnt=2, val=22, word=b} | {cnt=2, val=22, word=b} | 1691551246233 | 1691551246732 |2023-02-24T16:06:21.166|
| {1} | 0 | data-10abb5bc-0170-43ae-b6a... | orc | 0 | 0 | 1 | 595 | [a] | [a] | {cnt=0, val=0, word=0} | {cnt=1, val=11, word=a} | {cnt=1, val=11, word=a} | 1691551246267 | 1691551246798 |2023-02-24T16:06:21.166|
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
3 rows in set
*/
Expand Down Expand Up @@ -352,7 +352,7 @@ SELECT * FROM my_table$partitions;
+---------------+----------------+--------------------+--------------------+------------------------+
| partition | record_count | file_size_in_bytes| file_count| last_update_time|
+---------------+----------------+--------------------+--------------------+------------------------+
| [1] | 1 | 645 | 1 | 2024-06-24 10:25:57.400|
| {1} | 1 | 645 | 1 | 2024-06-24 10:25:57.400|
+---------------+----------------+--------------------+--------------------+------------------------+
*/
```
Expand Down Expand Up @@ -401,8 +401,8 @@ SELECT * FROM my_table$table_indexes;
+--------------------------------+-------------+--------------------------------+--------------------------------+----------------------+----------------------+--------------------------------+
| partition | bucket | index_type | file_name | file_size | row_count | dv_ranges |
+--------------------------------+-------------+--------------------------------+--------------------------------+----------------------+----------------------+--------------------------------+
| [2024-10-01] | 0 | HASH | index-70abfebf-149e-4796-9f... | 12 | 3 | <NULL> |
| [2024-10-01] | 0 | DELETION_VECTORS | index-633857e7-cdce-47d2-87... | 33 | 1 | [(data-346cb9c8-4032-4d66-a... |
| {2024-10-01} | 0 | HASH | index-70abfebf-149e-4796-9f... | 12 | 3 | <NULL> |
| {2024-10-01} | 0 | DELETION_VECTORS | index-633857e7-cdce-47d2-87... | 33 | 1 | [(data-346cb9c8-4032-4d66-a... |
+--------------------------------+-------------+--------------------------------+--------------------------------+----------------------+----------------------+--------------------------------+
2 rows in set
*/
Expand Down
2 changes: 1 addition & 1 deletion docs/content/flink/procedures.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ All available procedures are listed below.
-- based on the specified snapshot <br/>
CALL [catalog.]sys.create_tag(`table` => 'identifier', tag => 'tagName', snapshot_id => snapshotId) <br/>
-- based on the latest snapshot <br/>
CALL [catalog.]sys.create_tag(`table` => 'identifier', snapshot_id => 'tagName') <br/><br/>
CALL [catalog.]sys.create_tag(`table` => 'identifier', tag => 'tagName') <br/><br/>
-- Use indexed argument<br/>
-- based on the specified snapshot <br/>
CALL [catalog.]sys.create_tag('identifier', 'tagName', snapshotId) <br/>
Expand Down
17 changes: 17 additions & 0 deletions docs/content/flink/sql-query.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,23 @@ If you want see `DELETE` records, you can use audit_log table:
SELECT * FROM t$audit_log /*+ OPTIONS('incremental-between' = '12,20') */;
```

### Batch Incremental between Auto-created Tags

You can use `incremental-between` to query incremental changes between two tags. But for auto-created tag, the tag may
not be created in-time because of data delay.

For example, assume that tags '2024-12-01', '2024-12-02' and '2024-12-04' are auto created daily. Data for 12/03 are delayed
and ingested with data for 12/04. Now if you want to query the incremental changes between tags, and you don't know the tag
of 12/03 is not created, you will use `incremental-between` with '2024-12-01,2024-12-02', '2024-12-02,2024-12-03' and
'2024-12-03,2024-12-04' respectively, then you will get an error that the tag '2024-12-03' doesn't exist.

We introduced a new option `incremental-to-auto-tag` for this scenario. You can only specify the end tag, and Paimon will
find an earlier tag and return changes between them. If the tag doesn't exist or the earlier tag doesn't exist, return empty.

For example, when you query 'incremental-to-auto-tag=2024-12-01' or 'incremental-to-auto-tag=2024-12-03', the result is
empty; Query 'incremental-to-auto-tag=2024-12-02', the result is change between 12/01 and 12/02; Query 'incremental-to-auto-tag=2024-12-04',
the result is change between 12/02 and 12/04.

## Streaming Query

By default, Streaming read produces the latest snapshot on the table upon first startup,
Expand Down
Loading

0 comments on commit c56388c

Please sign in to comment.