Skip to content

Commit

Permalink
[doc] Recommend to use 200MB bucket size
Browse files Browse the repository at this point in the history
  • Loading branch information
JingsongLi committed Dec 6, 2023
1 parent 182d7c4 commit 71057f3
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/content/concepts/basic-concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Unpartitioned tables, or partitions in partitioned tables, are sub-divided into

The range for a bucket is determined by the hash value of one or more columns in the records. Users can specify bucketing columns by providing the [`bucket-key` option]({{< ref "maintenance/configurations#coreoptions" >}}). If no `bucket-key` option is specified, the primary key (if defined) or the complete record will be used as the bucket key.

A bucket is the smallest storage unit for reads and writes, so the number of buckets limits the maximum processing parallelism. This number should not be too big, though, as it will result in lots of small files and low read performance. In general, the recommended data size in each bucket is about 1GB.
A bucket is the smallest storage unit for reads and writes, so the number of buckets limits the maximum processing parallelism. This number should not be too big, though, as it will result in lots of small files and low read performance. In general, the recommended data size in each bucket is about 200MB - 1GB.

See [file layouts]({{< ref "concepts/file-layouts" >}}) for how files are divided into buckets. Also, see [rescale bucket]({{< ref "maintenance/rescale-bucket" >}}) if you want to adjust the number of buckets after a table is created.

Expand Down

0 comments on commit 71057f3

Please sign in to comment.