[flink] Add Clustering section in doc to introduce range partition an…

…d sort
apache · May 28, 2024 · 860bc0f · 860bc0f
1 parent 7e4c2f5
commit 860bc0f
Showing 1 changed file with 32 additions and 0 deletions.
diff --git a/docs/content/flink/sql-write.md b/docs/content/flink/sql-write.md
@@ -49,6 +49,38 @@ snapshot expiration, and even partition expiration in Flink Sink (if it is confi
 
 For multiple jobs to write the same table, you can refer to [dedicated compaction job]({{< ref "maintenance/dedicated-compaction#dedicated-compaction-job" >}}) for more info.
 
+### Clustering
+
+In Paimon, clustering is a feature that allows you to cluster data in your [Append Table]({{< ref "append-table/append-table#Append Table" >}}) 
+based on the values of certain columns during the write process. This organization of data can significantly enhance the efficiency of downstream 
+tasks when reading the data, as it enables faster and more targeted data retrieval.
+
+To utilize clustering, you can specify the columns you want to cluster when creating or writing to a table. Here's a simple example of how to enable clustering:
+
+```sql
+CREATE TABLE my_table (
+    a STRING,
+    b STRING,
+    c STRING,
+) WITH (
+  'sink.clustering.by-columns' = 'a,b',
+);
+```
+
+You can also use SQL hints to dynamically set clustering options:
+
+```sql
+INSERT INTO my_table /*+ OPTIONS('sink.clustering.by-columns' = 'a,b') */
+SELECT * FROM source;
+```
+
+The clustering is implemented by range partition and sort. The range partition relies on the sampling and if the sampling process consumed too much
+time in the job, you can decrease the total sample number by setting the `sink.clustering.sample-factor`. The comparison algorithm of range partition is auto 
+decided (including ORDER/ZORDER/HILBERT), but you can set it manually by setting the `sink.clustering.strategy`. The sort step is enabled at default to 
+further optimize data orderliness. If the sort slows down the job, you can set the `sink.clustering.sort-in-cluster` to false. 
+
+You can refer to [FlinkConnectorOptions]({{< ref "maintenance/configurations#FlinkConnectorOptions" >}}) for more info about the configurations above.
+
 ## Overwriting the Whole Table
 
 For unpartitioned tables, Paimon supports overwriting the whole table.