Skip to content

Commit

Permalink
[core] Rename paimon: Table Store to Paimon
Browse files Browse the repository at this point in the history
  • Loading branch information
JingsongLi committed Mar 18, 2023
1 parent 0524726 commit 8856b61
Show file tree
Hide file tree
Showing 64 changed files with 173 additions and 172 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Flink Table Store
# Paimon

Flink Table Store is a data lake storage for streaming updates/deletes changelog ingestion and high-performance queries in real time.
Paimon is a data lake storage for streaming updates/deletes changelog ingestion and high-performance queries in real time.

Flink Table Store is developed under the umbrella of [Apache Flink](https://flink.apache.org/).
Paimon is developed under the umbrella of [Apache Flink](https://flink.apache.org/).

## Documentation & Getting Started

Expand Down
6 changes: 3 additions & 3 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
This README gives an overview of how to build and contribute to the
documentation of Flink Table Store.
documentation of Paimon.

The documentation is included with the source of Flink Table Store in order to ensure
The documentation is included with the source of Paimon in order to ensure
that you always have docs corresponding to your checked-out version.

# Requirements
Expand Down Expand Up @@ -85,7 +85,7 @@ the page:

### ShortCodes

Flink Table Store uses [shortcodes](https://gohugo.io/content-management/shortcodes/) to add
Paimon uses [shortcodes](https://gohugo.io/content-management/shortcodes/) to add
custom functionality to its documentation markdown.

Its implementation and documentation can be found at
Expand Down
16 changes: 8 additions & 8 deletions docs/content/_index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Apache Flink Table Store
title: Apache Paimon
type: docs
bookToc: false
---
Expand All @@ -22,26 +22,26 @@ specific language governing permissions and limitations
under the License.
-->

# Apache Flink Table Store
# Apache Paimon

Flink Table Store is a unified storage to build dynamic tables for both streaming and
Paimon is a unified storage to build dynamic tables for both streaming and
batch processing in Flink, supporting high-speed data ingestion and timely data query.
Table Store offers the following core capabilities:
Paimon offers the following core capabilities:
- Support storage of large datasets and allow read/write in both batch and streaming mode.
- Support streaming queries with minimum latency down to milliseconds.
- Support Batch/OLAP queries with minimum latency down to the second level.
- Support incremental snapshots for stream consumption by default. So users do not need to combine different pipelines by themself.

{{< columns >}}
## Try Table Store
## Try Paimon

If you’re interested in playing around with Flink Table Store, check out our
If you’re interested in playing around with Paimon, check out our
quick start guide with [Flink]({{< ref "engines/flink" >}}), [Spark]({{< ref "engines/spark3" >}}) or [Hive]({{< ref "engines/hive" >}}). It provides a step by
step introduction to the APIs and guides you through real applications.

<--->

## Get Help with Table Store
## Get Help with Paimon

If you get stuck, check out our [community support
resources](https://flink.apache.org/community.html). In particular, Apache
Expand All @@ -50,5 +50,5 @@ any Apache project, and is a great way to get help quickly.

{{< /columns >}}

Flink Table Store is developed under the umbrella of
Paimon is developed under the umbrella of
[Apache Flink](https://flink.apache.org/).
4 changes: 2 additions & 2 deletions docs/content/concepts/basic-concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ A snapshot captures the state of a table at some point in time. Users can access

## Partition

Table Store adopts the same partitioning concept as Apache Hive to separate data.
Paimon adopts the same partitioning concept as Apache Hive to separate data.

Partitioning is an optional way of dividing a table into related parts based on the values of particular columns like date, city, and department. Each table can have one or more partition keys to identify a particular partition.

Expand All @@ -56,6 +56,6 @@ See [file layouts]({{< ref "concepts/file-layouts" >}}) for how files are divide

## Consistency Guarantees

Table Store writers uses two-phase commit protocol to atomically commit a batch of records to the table. Each commit produces at most two [snapshots]({{< ref "concepts/basic-concepts#snapshot" >}}) at commit time.
Paimon writers uses two-phase commit protocol to atomically commit a batch of records to the table. Each commit produces at most two [snapshots]({{< ref "concepts/basic-concepts#snapshot" >}}) at commit time.

For any two writers modifying a table at the same time, as long as they do not modify the same bucket, their commits are serializable. If they modify the same bucket, only snapshot isolation is guaranteed. That is, the final table state may be a mix of the two commits, but no changes are lost.
4 changes: 2 additions & 2 deletions docs/content/concepts/external-log-systems.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ under the License.

# External Log Systems

Aside from [underlying table files]({{< ref "concepts/primary-key-table#changelog-producers" >}}), changelog of Table Store can also be stored into or consumed from an external log system, such as Kafka. By specifying `log.system` table property, users can choose which external log system to use.
Aside from [underlying table files]({{< ref "concepts/primary-key-table#changelog-producers" >}}), changelog of Paimon can also be stored into or consumed from an external log system, such as Kafka. By specifying `log.system` table property, users can choose which external log system to use.

If an external log system is used, all records written into table files will also be written into the log system. Changes produced by the streaming queries will thus come from the log system instead of table files.

Expand All @@ -36,7 +36,7 @@ By default, changes in the log systems are visible to consumers only after a sna

However, users can also specify the table property `'log.consistency' = 'eventual'` so that changelog written into the log system can be immediately consumed by the consumers, without waiting for the next snapshot. This behavior decreases the latency of changelog, but it can only guarantee the at-least-once semantics (that is, consumers might see duplicated records) due to possible failures.

If `'log.consistency' = 'eventual'` is set, in order to achieve correct results, Table Store source in Flink will automatically adds a "normalize" operator for deduplication. This operator persists the values of each key in states. As one can easily tell, this operator will be very costly and should be avoided.
If `'log.consistency' = 'eventual'` is set, in order to achieve correct results, Paimon source in Flink will automatically adds a "normalize" operator for deduplication. This operator persists the values of each key in states. As one can easily tell, this operator will be very costly and should be avoided.

## Supported Log Systems

Expand Down
8 changes: 4 additions & 4 deletions docs/content/concepts/file-layouts.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ under the License.

# File Layouts

All files of a table are stored under one base directory. Table Store files are organized in a layered style. The following image illustrates the file layout. Starting from a snapshot file, Table Store readers can recursively access all records from the table.
All files of a table are stored under one base directory. Paimon files are organized in a layered style. The following image illustrates the file layout. Starting from a snapshot file, Paimon readers can recursively access all records from the table.

{{< img src="/img/file-layout.png">}}

Expand All @@ -53,7 +53,7 @@ Data files are grouped by partitions and buckets. Each bucket directory contains

## LSM Trees

Table Store adapts the LSM tree (log-structured merge-tree) as the data structure for file storage. This documentation briefly introduces the concepts about LSM trees.
Paimon adapts the LSM tree (log-structured merge-tree) as the data structure for file storage. This documentation briefly introduces the concepts about LSM trees.

### Sorted Runs

Expand All @@ -73,6 +73,6 @@ When more and more records are written into the LSM tree, the number of sorted r

To limit the number of sorted runs, we have to merge several sorted runs into one big sorted run once in a while. This procedure is called compaction.

However, compaction is a resource intensive procedure which consumes a certain amount of CPU time and disk IO, so too frequent compaction may in turn result in slower writes. It is a trade-off between query and write performance. Table Store currently adapts a compaction strategy similar to Rocksdb's [universal compaction](https://github.com/facebook/rocksdb/wiki/Universal-Compaction).
However, compaction is a resource intensive procedure which consumes a certain amount of CPU time and disk IO, so too frequent compaction may in turn result in slower writes. It is a trade-off between query and write performance. Paimon currently adapts a compaction strategy similar to Rocksdb's [universal compaction](https://github.com/facebook/rocksdb/wiki/Universal-Compaction).

By default, when Table Store writers append records to the LSM tree, they'll also perform compactions as needed. Users can also choose to perform all compactions in a dedicated compaction job. See [dedicated compaction job]({{< ref "maintenance/write-performance#dedicated-compaction-job" >}}) for more info.
By default, when Paimon writers append records to the LSM tree, they'll also perform compactions as needed. Users can also choose to perform all compactions in a dedicated compaction job. See [dedicated compaction job]({{< ref "maintenance/write-performance#dedicated-compaction-job" >}}) for more info.
10 changes: 5 additions & 5 deletions docs/content/concepts/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ under the License.

# Overview

Flink Table Store is a unified storage to build dynamic tables for both streaming and
Paimon is a unified storage to build dynamic tables for both streaming and
batch processing in Flink, supporting high-speed data ingestion and timely data query.

## Architecture
Expand All @@ -35,18 +35,18 @@ batch processing in Flink, supporting high-speed data ingestion and timely data

As shown in the architecture above:

**Read/Write:** Table Store supports a versatile way to read/write data and perform OLAP queries.
**Read/Write:** Paimon supports a versatile way to read/write data and perform OLAP queries.
- For reads, it supports consuming data
- from historical snapshots (in batch mode),
- from the latest offset (in streaming mode), or
- reading incremental snapshots in a hybrid way.
- For writes, it supports streaming synchronization from the changelog of databases (CDC) or batch
insert/overwrite from offline data.

**Ecosystem:** In addition to Apache Flink, Table Store also supports read by other computation
**Ecosystem:** In addition to Apache Flink, Paimon also supports read by other computation
engines like Apache Hive, Apache Spark and Trino.

**Internal:** Under the hood, Table Store uses a hybrid storage architecture with a lake format to store
**Internal:** Under the hood, Paimon uses a hybrid storage architecture with a lake format to store
historical data and a queue system to store incremental data. The former stores the columnar files on
the filesystem/object-store and uses the LSM tree structure to support a large volume of data updates
and high-performance queries. The latter uses Apache Kafka to capture data in real-time.
Expand All @@ -62,7 +62,7 @@ There are three types of connectors in Flink SQL.
- Batch storage, such as Apache Hive, it supports various operations
of the traditional batch processing, including `INSERT OVERWRITE`.

Flink Table Store provides table abstraction. It is used in a way that
Paimon provides table abstraction. It is used in a way that
does not differ from the traditional database:
- In Flink `batch` execution mode, it acts like a Hive table and
supports various operations of Batch SQL. Query it to see the
Expand Down
18 changes: 9 additions & 9 deletions docs/content/concepts/primary-key-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,13 @@ under the License.

Changelog table is the default table type when creating a table. Users can insert, update or delete records in the table.

Primary keys are a set of columns that are unique for each record. Table Store imposes an ordering of data, which means the system will sort the primary key within each bucket. Using this feature, users can achieve high performance by adding filter conditions on the primary key.
Primary keys are a set of columns that are unique for each record. Paimon imposes an ordering of data, which means the system will sort the primary key within each bucket. Using this feature, users can achieve high performance by adding filter conditions on the primary key.

By [defining primary keys]({{< ref "how-to/creating-tables#tables-with-primary-keys" >}}) on a changelog table, users can access the following features.

## Merge Engines

When Table Store sink receives two or more records with the same primary keys, it will merge them into one record to keep primary keys unique. By specifying the `merge-engine` table property, users can choose how records are merged together.
When Paimon sink receives two or more records with the same primary keys, it will merge them into one record to keep primary keys unique. By specifying the `merge-engine` table property, users can choose how records are merged together.

{{< hint info >}}
Set `table.exec.sink.upsert-materialize` to `NONE` always in Flink SQL TableConfig, sink upsert-materialize may
Expand All @@ -44,15 +44,15 @@ result in strange behavior. When the input is out of order, we recommend that yo

### Deduplicate

`deduplicate` merge engine is the default merge engine. Table Store will only keep the latest record and throw away other records with the same primary keys.
`deduplicate` merge engine is the default merge engine. Paimon will only keep the latest record and throw away other records with the same primary keys.

Specifically, if the latest record is a `DELETE` record, all records with the same primary keys will be deleted.

### Partial Update

By specifying `'merge-engine' = 'partial-update'`, users can set columns of a record across multiple updates and finally get a complete record. Specifically, value fields are updated to the latest data one by one under the same primary key, but null values are not overwritten.

For example, let's say Table Store receives three records:
For example, let's say Paimon receives three records:
- `<1, 23.0, 10, NULL>`-
- `<1, NULL, NULL, 'This is a book'>`
- `<1, 25.2, NULL, NULL>`
Expand Down Expand Up @@ -126,7 +126,7 @@ The `changelog-producer` table property only affects changelog from files. It do

### None

By default, no extra changelog producer will be applied to the writer of table. Table Store source can only see the merged changes across snapshots, like what keys are removed and what are the new values of some keys.
By default, no extra changelog producer will be applied to the writer of table. Paimon source can only see the merged changes across snapshots, like what keys are removed and what are the new values of some keys.

However, these merged changes cannot form a complete changelog, because we can't read the old values of the keys directly from them. Merged changes require the consumers to "remember" the values of each key and to rewrite the values without seeing the old ones. Some consumers, however, need the old values to ensure correctness or efficiency.

Expand All @@ -138,9 +138,9 @@ To conclude, `none` changelog producers are best suited for consumers such as a

### Input

By specifying `'changelog-producer' = 'input'`, Table Store writers rely on their inputs as a source of complete changelog. All input records will be saved in separated [changelog files]({{< ref "concepts/file-layouts" >}}) and will be given to the consumers by Table Store sources.
By specifying `'changelog-producer' = 'input'`, Paimon writers rely on their inputs as a source of complete changelog. All input records will be saved in separated [changelog files]({{< ref "concepts/file-layouts" >}}) and will be given to the consumers by Paimon sources.

`input` changelog producer can be used when Table Store writers' inputs are complete changelog, such as from a database CDC, or generated by Flink stateful computation.
`input` changelog producer can be used when Paimon writers' inputs are complete changelog, such as from a database CDC, or generated by Flink stateful computation.

{{< img src="/img/changelog-producer-input.png">}}

Expand All @@ -152,7 +152,7 @@ This is an experimental feature.

If your input can’t produce a complete changelog but you still want to get rid of the costly normalized operator, you may consider using the `'lookup'` changelog producer.

By specifying `'changelog-producer' = 'lookup'`, Table Store will generate changelog through `'lookup'` before committing the data writing.
By specifying `'changelog-producer' = 'lookup'`, Paimon will generate changelog through `'lookup'` before committing the data writing.

{{< img src="/img/changelog-producer-lookup.png">}}

Expand Down Expand Up @@ -194,7 +194,7 @@ Lookup will cache data on the memory and local disk, you can use the following o
If you think the resource consumption of 'lookup' is too large, you can consider using 'full-compaction' changelog producer,
which can decouple data writing and changelog generation, and is more suitable for scenarios with high latency (For example, 10 minutes).

By specifying `'changelog-producer' = 'full-compaction'`, Table Store will compare the results between full compactions and produce the differences as changelog. The latency of changelog is affected by the frequency of full compactions.
By specifying `'changelog-producer' = 'full-compaction'`, Paimon will compare the results between full compactions and produce the differences as changelog. The latency of changelog is affected by the frequency of full compactions.

By specifying `changelog-producer.compaction-interval` table property (default value `0s`), users can define the maximum interval between two full compactions to ensure latency. This is set to 0 by default, so each checkpoint will have a full compression and generate a change log.

Expand Down
12 changes: 6 additions & 6 deletions docs/content/engines/flink.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@ under the License.

# Flink

This documentation is a guide for using Table Store in Flink.
This documentation is a guide for using Paimon in Flink.

## Preparing Table Store Jar File
## Preparing Paimon Jar File

Table Store currently supports Flink 1.16, 1.15 and 1.14. We recommend the latest Flink version for a better experience.
Paimon currently supports Flink 1.16, 1.15 and 1.14. We recommend the latest Flink version for a better experience.

{{< stable >}}

Expand All @@ -48,7 +48,7 @@ You can also manually build bundled jar from the source code.

{{< unstable >}}

You are using an unreleased version of Table Store so you need to manually build bundled jar from the source code.
You are using an unreleased version of Paimon so you need to manually build bundled jar from the source code.

{{< /unstable >}}

Expand All @@ -69,7 +69,7 @@ If you haven't downloaded Flink, you can [download Flink 1.16](https://flink.apa
tar -xzf flink-*.tgz
```

**Step 2: Copy Table Store Bundled Jar**
**Step 2: Copy Paimon Bundled Jar**

Copy paimon bundled jar to the `lib` directory of your Flink home.

Expand Down Expand Up @@ -111,7 +111,7 @@ You can now start Flink SQL client to execute SQL scripts.
**Step 5: Create a Catalog and a Table**

```sql
-- if you're trying out Table Store in a distributed environment,
-- if you're trying out Paimon in a distributed environment,
-- warehouse path should be set to a shared file system, such as HDFS or OSS
CREATE CATALOG my_catalog WITH (
'type'='paimon',
Expand Down
Loading

0 comments on commit 8856b61

Please sign in to comment.