Skip to content

Commit

Permalink
[doc] Split Primary Key Table doc to multiple pages
Browse files Browse the repository at this point in the history
  • Loading branch information
JingsongLi committed Jan 4, 2024
1 parent aea7cd5 commit 6574026
Show file tree
Hide file tree
Showing 13 changed files with 680 additions and 586 deletions.
4 changes: 1 addition & 3 deletions docs/content/concepts/basic-concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,7 @@ Partitioning is an optional way of dividing a table into related parts based on
By partitioning, users can efficiently operate on a slice of records in the table. See [file layouts]({{< ref "concepts/file-layouts" >}}) for how files are divided into multiple partitions.

{{< hint info >}}

Partition keys must be a subset of primary keys if primary keys are defined. If you need cross partition upsert (primary keys not contain all partition fields), you should use [Dynamic Bucket]({{< ref "concepts/primary-key-table#dynamic-bucket">}}) mode.

If you need cross partition upsert (primary keys not contain all partition fields), see [Cross partition Upsert]({{< ref "concepts/primary-key-table/data-distribution#cross-partitions-upsert-dynamic-bucket-mode">}}) mode.
{{< /hint >}}

## Bucket
Expand Down
4 changes: 2 additions & 2 deletions docs/content/concepts/file-layouts.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ A manifest file is a file containing changes about LSM data files and changelog

## Data Files

Data files are grouped by partitions and buckets. Each bucket directory contains an [LSM tree]({{< ref "concepts/file-layouts#lsm-trees" >}}) and its [changelog files]({{< ref "concepts/primary-key-table#changelog-producers" >}}).
Data files are grouped by partitions and buckets. Each bucket directory contains an [LSM tree]({{< ref "concepts/file-layouts#lsm-trees" >}}) and its [changelog files]({{< ref "concepts/primary-key-table/changelog-producer" >}}).

Currently, Paimon supports using orc(default), parquet and avro as data file's format.

Expand All @@ -65,7 +65,7 @@ Records within a data file are sorted by their primary keys. Within a sorted run

{{< img src="/img/sorted-runs.png">}}

As you can see, different sorted runs may have overlapping primary key ranges, and may even contain the same primary key. When querying the LSM tree, all sorted runs must be combined and all records with the same primary key must be merged according to the user-specified [merge engine]({{< ref "concepts/primary-key-table#merge-engines" >}}) and the timestamp of each record.
As you can see, different sorted runs may have overlapping primary key ranges, and may even contain the same primary key. When querying the LSM tree, all sorted runs must be combined and all records with the same primary key must be merged according to the user-specified [merge engine]({{< ref "concepts/primary-key-table/merge-engine" >}}) and the timestamp of each record.

New records written into the LSM tree will be first buffered in memory. When the memory buffer is full, all records in memory will be sorted and flushed to disk. A new sorted run is now created.

Expand Down
Loading

0 comments on commit 6574026

Please sign in to comment.