-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binlog: Improve ZstdInMemoryDecompressorMaxSize management #17220
Binlog: Improve ZstdInMemoryDecompressorMaxSize management #17220
Conversation
Signed-off-by: Matt Lord <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Matt Lord <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #17220 +/- ##
==========================================
+ Coverage 67.40% 67.41% +0.01%
==========================================
Files 1570 1570
Lines 252903 252917 +14
==========================================
+ Hits 170460 170501 +41
+ Misses 82443 82416 -27 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
…essed events larger than that Moving to zstd.NewReader(nil, zstd.WithDecoderLowmem(true), zstd.WithDecoderConcurrency(1)) allows us to process the payload using the least amount of memory possible. Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
The uncompressed size, read from the header, is what dictates the decompression/decoding method. Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
@@ -94,4 +95,6 @@ func registerFlags(fs *pflag.FlagSet) { | |||
fs.BoolVar(&vreplicationStoreCompressedGTID, "vreplication_store_compressed_gtid", vreplicationStoreCompressedGTID, "Store compressed gtids in the pos column of the sidecar database's vreplication table") | |||
|
|||
fs.IntVar(&vreplicationParallelInsertWorkers, "vreplication-parallel-insert-workers", vreplicationParallelInsertWorkers, "Number of parallel insertion workers to use during copy phase. Set <= 1 to disable parallelism, or > 1 to enable concurrent insertion during copy phase.") | |||
|
|||
fs.Uint64Var(&mysql.ZstdInMemoryDecompressorMaxSize, "binlog-in-memory-decompressor-max-size", mysql.ZstdInMemoryDecompressorMaxSize, "This value sets the uncompressed transaction payload size at which we switch from in-memory buffer based decompression to the slower streaming mode.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add validation to the entered value - there have to be some bounds I guess? And first read it into a local variable before assigning it into mysql.ZstdInMemoryDecompressorMaxSize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bounds are set by the type. If it’s 0 then streaming mode will always be used. If it’s the max then the in-memory buffers will always be used.
Why would we read it into a local variable first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we read it into a local variable first?
We'd need to if there was need for boundary checking. But per your comment boundary checking is not requried, so a local variable is not needed.
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
…ment (#17220) (#17241) Signed-off-by: Matt Lord <[email protected]> Co-authored-by: vitess-bot[bot] <108069721+vitess-bot[bot]@users.noreply.github.com>
…17220) Signed-off-by: Matt Lord <[email protected]>
…17220) Signed-off-by: Matt Lord <[email protected]> Signed-off-by: Renan Rangel <[email protected]>
Description
Vitess supports MySQL's binlog transaction compression. That support lives primarily in a single file: https://github.com/vitessio/vitess/blob/main/go/mysql/binlog_event_compression.go
There's an important variable which controls HOW that work is done. That variable is currently a
const
:vitess/go/mysql/binlog_event_compression.go
Lines 62 to 65 in f6067e0
That is currently hardcoded at 128MiB, which was somewhat arbitrary. The thinking was that all transactions will be compressed and you want to process them as fast as possible, while still being able to support payloads of virtually any size — the compressed transaction payload is limited by MySQL's
max_allowed_packet
size, but the size of the uncompressed payload is not strictly limited.The in-memory buffer based decoding is fast but memory intensive, so the 128MiB threshold can be too high for environments that are memory constrained (e.g. 512MiB of memory or less). This PR makes this setting configurable via a vttablet flag —
--binlog-in-memory-decompressor-max-size
— so that it can be configured at thevttablet
level based on the details of the execution environment for the process.Note
This PR also changes how we handle the larger payloads. In local testing done for this PR I realized that the MaxMemory and/or MaxWindowSize options (same for streaming) that were added in v21+ via #16328 means that we CANNOT process compressed payloads which have an uncompressed size larger than the given size (example here). So this PR moves to this for the streaming method:
zstd.NewReader(nil, zstd.WithDecoderLowmem(true), zstd.WithDecoderConcurrency(1))
Which allows us to process the large payload (> ZstdInMemoryDecompressorMaxSize), no matter the size, but using the least amount of memory possible as it instructs the reader to limit memory allocations and limit it to 1 in flight window or block.
It's for this reason that, while this isn't the kind of thing we would normally backport (a new flag), the new flag and the noted change above are critical for those using MySQL with
--binlog_transaction_compression
. And thezstd.WithDecoderMaxMemory
usage is new in v21 via #16328 so I think we should backport this to v21. You can see the failure users could encounter w/o it on main here: https://gist.github.com/mattlord/17c7dbf7985b8805bb6db3efcbaf2218Related Issue(s)
zstdInMemoryDecompressorMaxSize
#17219Checklist