Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure ACKs are sent after backup #920

Closed
wants to merge 4 commits into from

Conversation

dt8269
Copy link
Contributor

@dt8269 dt8269 commented Sep 30, 2024

Summary

This PR addresses an issue where acknowledgments (ACKs) were sometimes sent to the master before binlog events were fully written and fsynced to disk during backup operations. Sending ACKs prematurely in semi-synchronous replication could lead to data loss if the replica fails after sending the ACK but before persisting the event.

ACK after backup

  • Ensured that ACKs are sent only after the event has been fully processed and fsynced by sending the ACK after event handling

Event handling

  • Introduced an EventHandler interface with a HandleEvent method for processing binlog events. This allows custom event handling logic to be injected into the replication stream
  • Accordingly, added an eventHandler field and SetEventHandler method to BinlogSyncer.
  • Implemented BackupEventHandler which writes binlog events to disk and ensures that each event is fsynced before returning. This ensures data durability before ACKs are sent.

Separating parsing from handling

  • Modified the onStream method in BinlogSyncer to separate event parsing (parseEvent) from event handling and ACK sending (handleEventAndACK). This adheres to the single-responsibility principle and makes the code cleaner
  • Moved state updates (e.g., updating b.nextPos) and GTIDSet handling from parseEvent to handleEventAndACK to avoid side effects during parsing. Something called parseEvent should only be parsing, not modifying state or sending ACKs.

Testing

Three new tests:

  • TestGTIDSetHandling - This confirms the continued functionality of the existing GTIDSet logic after it was involved in a refactor
  • TestACKSentAfterFsync - This confirms that ACKs are only send after the backup files are fsynced
  • TestBackupEventHandlerInvocation - This confirms that the (new) EventHandler works as expected
❯ go test
PASS
ok      github.com/go-mysql-org/go-mysql/replication    0.703s

Dylan Terry added 2 commits September 30, 2024 14:30
This commit addresses an issue where acknowledgments (ACKs) were sometimes sent to the master before binlog events were fully written and fsynced to disk during backup operations. Sending ACKs prematurely in semi-synchronous replication could lead to data loss if the replica fails after sending the ACK but before persisting the event.

Key changes:

- Introduced an `EventHandler` interface with a `HandleEvent` method for
  processing binlog events. This allows custom event handling logic to
  be injected into the replication stream.

- Added an `eventHandler` field to `BinlogSyncer` and provided a
  `SetEventHandler` method to assign an event handler. This enables
  `BinlogSyncer` to delegate event processing to the assigned handler.

- Implemented `BackupEventHandler` which writes binlog events to disk
  and ensures that each event is fsynced before returning. This ensures
  data durability before ACKs are sent.

- Modified the `onStream` method in `BinlogSyncer` to separate event
  parsing (`parseEvent`) from event handling and ACK sending
  (`handleEventAndACK`). This adheres to the single-responsibility
  principle and makes the code cleaner.

- Moved state updates (e.g., updating `b.nextPos`) and GTID set handling
  from `parseEvent` to `handleEventAndACK` to avoid side effects during
  parsing.

- Ensured that ACKs are sent only after the event has been fully
  processed and fsynced by sending the ACK in `handleEventAndACK` after
  event handling.
…dling

- Updated `backup_test.go` to ensure binlog files are written and fsynced before ACKs are sent by checking that backup files exist and have non-zero sizes after the backup process completes
- Implemented `CountingEventHandler` in `backup_test.go` to confirm that the `EventHandler` is invoked during backups, ensuring events are processed as expected
- Added `TestGTIDSetHandling` in `replication_test.go` to verify that GTIDSets are correctly handled after refactoring
- Modified `testSync` method in `replication_test.go` to use a custom `EventHandler` and verify that events are processed correctly before ACKs are sent
@dt8269 dt8269 force-pushed the dterry/durablebinlogs branch from b0273aa to 1b08ee2 Compare September 30, 2024 19:12
@dt8269 dt8269 changed the title Ensure ACKs are sent after fsync in backup and refactor event handling Ensure ACKs are sent after backup Sep 30, 2024
@lance6716
Copy link
Collaborator

maybe we can limit the user to either use the new blocking handler or use the existing asynchronous streaming consumer. in your use case of persisting to disk, no need to bother draining the stream by GetEvent

to achieve this, don't send the event to the channel if handler is set. and refine other logic and comments.

@dt8269 dt8269 force-pushed the dterry/durablebinlogs branch 7 times, most recently from 1ff1b33 to a23b791 Compare October 2, 2024 18:04
@dt8269 dt8269 force-pushed the dterry/durablebinlogs branch from a23b791 to 591a1c7 Compare October 2, 2024 18:15
@dt8269 dt8269 closed this Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants