diff --git a/docs/content/append-table/query.md b/docs/content/append-table/query-performance.md similarity index 96% rename from docs/content/append-table/query.md rename to docs/content/append-table/query-performance.md index 4f7e0ab9f66f..32ea806b864a 100644 --- a/docs/content/append-table/query.md +++ b/docs/content/append-table/query-performance.md @@ -1,9 +1,9 @@ --- -title: "Query" +title: "Query Performance" weight: 3 type: docs aliases: -- /append-table/query.html +- /append-table/query-performance.html --- -# Query +# Query Performance ## Data Skipping By Order @@ -57,8 +57,6 @@ multiple columns. Different file index may be efficient in different scenario. For example bloom filter may speed up query in point lookup scenario. Using a bitmap may consume more space but can result in greater accuracy. -Currently, file index is only supported in append-only table. - `Bloom Filter`: * `file-index.bloom-filter.columns`: specify the columns that need bloom filter index. * `file-index.bloom-filter..fpp` to config false positive probability. diff --git a/docs/content/primary-key-table/query-performance.md b/docs/content/primary-key-table/query-performance.md new file mode 100644 index 000000000000..971be8ae66bb --- /dev/null +++ b/docs/content/primary-key-table/query-performance.md @@ -0,0 +1,72 @@ +--- +title: "Query Performance" +weight: 8 +type: docs +aliases: +- /primary-key-table/query-performance.html +--- + + +# Query Performance + +## Table Mode + +The table schema has the greatest impact on query performance. See [Table Mode]({{< ref "primary-key-table/table-mode" >}}). + +For Merge On Read table, the most important thing you should pay attention to is the number of buckets, which will limit +the concurrency of reading data. + +For MOW (Deletion Vectors) or COW table or [Read Optimized]({{< ref "maintenance/system-tables#read-optimized-table" >}}) table, +There is no limit to the concurrency of reading data, and they can also utilize some filtering conditions for non-primary-key columns. + +## Data Skipping By Primary Key Filter + +For a regular bucketed table (For example, bucket = 5), the filtering conditions of the primary key will greatly +accelerate queries and reduce the reading of a large number of files. + +## Data Skipping By File Index + +You can use file index to table with Deletion Vectors enabled, it filters files by index on the read side. + +```sql +CREATE TABLE WITH ( + 'deletion-vectors' = 'true', + 'file-index.bloom-filter.columns' = 'c1,c2', + 'file-index.bloom-filter.c1.items' = '200' +); +``` + +Supported filter types: + +`Bloom Filter`: +* `file-index.bloom-filter.columns`: specify the columns that need bloom filter index. +* `file-index.bloom-filter..fpp` to config false positive probability. +* `file-index.bloom-filter..items` to config the expected distinct items in one data file. + +`Bitmap`: +* `file-index.bitmap.columns`: specify the columns that need bitmap index. + +More filter types will be supported... + +If you want to add file index to existing table, without any rewrite, you can use `rewrite_file_index` procedure. Before +we use the procedure, you should config appropriate configurations in target table. You can use ALTER clause to config +`file-index..columns` to the table. + +How to invoke: see [flink procedures]({{< ref "flink/procedures#procedures" >}})