From d9a1b80a41169c41eb2628790d8bc4e7fc68467c Mon Sep 17 00:00:00 2001 From: Jingsong Date: Mon, 25 Nov 2024 15:23:44 +0800 Subject: [PATCH] [doc] Document changelog producer to use lookup --- .../primary-key-table/changelog-producer.md | 25 +++++++++++-------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/docs/content/primary-key-table/changelog-producer.md b/docs/content/primary-key-table/changelog-producer.md index bf7a23fae2a5..011f7b6f27a7 100644 --- a/docs/content/primary-key-table/changelog-producer.md +++ b/docs/content/primary-key-table/changelog-producer.md @@ -58,9 +58,11 @@ By specifying `'changelog-producer' = 'input'`, Paimon writers rely on their inp ## Lookup -If your input can’t produce a complete changelog but you still want to get rid of the costly normalized operator, you may consider using the `'lookup'` changelog producer. +If your input can’t produce a complete changelog but you still want to get rid of the costly normalized operator, you +may consider using the `'lookup'` changelog producer. -By specifying `'changelog-producer' = 'lookup'`, Paimon will generate changelog through `'lookup'` before committing the data writing. +By specifying `'changelog-producer' = 'lookup'`, Paimon will generate changelog through `'lookup'` before committing +the data writing (You can also enable [Async Compaction]({{< ref "primary-key-table/compaction#asynchronous-compaction" >}})). {{< img src="/img/changelog-producer-lookup.png">}} @@ -105,23 +107,26 @@ important for performance). ## Full Compaction -If you think the resource consumption of 'lookup' is too large, you can consider using 'full-compaction' changelog producer, -which can decouple data writing and changelog generation, and is more suitable for scenarios with high latency (For example, 10 minutes). +You can also consider using 'full-compaction' changelog producer to generate changelog, and is more suitable for scenarios +with large latency (For example, 30 minutes). -By specifying `'changelog-producer' = 'full-compaction'`, Paimon will compare the results between full compactions and produce the differences as changelog. The latency of changelog is affected by the frequency of full compactions. +1. By specifying `'changelog-producer' = 'full-compaction'`, Paimon will compare the results between full compactions and +produce the differences as changelog. The latency of changelog is affected by the frequency of full compactions. +2. By specifying `full-compaction.delta-commits` table property, full compaction will be constantly triggered after delta +commits (checkpoints). This is set to 1 by default, so each checkpoint will have a full compression and generate a +changelog. -By specifying `full-compaction.delta-commits` table property, full compaction will be constantly triggered after delta commits (checkpoints). This is set to 1 by default, so each checkpoint will have a full compression and generate a change log. +Generally speaking, the cost and consumption of full compaction are high, so we recommend using `'lookup'` changelog +producer. {{< img src="/img/changelog-producer-full-compaction.png">}} {{< hint info >}} -Full compaction changelog producer can produce complete changelog for any type of source. However it is not as efficient as the input changelog producer and the latency to produce changelog might be high. +Full compaction changelog producer can produce complete changelog for any type of source. However it is not as +efficient as the input changelog producer and the latency to produce changelog might be high. {{< /hint >}} Full-compaction changelog-producer supports `changelog-producer.row-deduplicate` to avoid generating -U, +U changelog for the same record. - -(Note: Please increase `'execution.checkpointing.max-concurrent-checkpoints'` Flink configuration, this is very -important for performance).