Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(storage-manager): race condition on latest_updates structure #1399

Merged
merged 1 commit into from
Sep 11, 2024

Conversation

J-Loudet
Copy link
Contributor

As the lock on the latest_updates structure was dropped before the operation on the Storage was performed and re-acquired afterwards, a race condition could have happened as illustrated in the following scenario.


Let us assume that the Storage and latest_updates have (A, T_A0).

A first operation is performed with T_A1 > T_A0. As the timestamp is greater, latest_updates will allow it. The lock on latest_updates is released.

The Storage is updated and has (A, T_A1).

In the meantime, another operation is received with T_A2 > T_A0. As the timestamp is greater than T_A0 (latest_updates was still not updated!), it is allowed.

For scheduling reasons, the second operation is performed entirely: the Storage and latest_updates both have (A, T_A2).

The scheduler goes back to finish T_A1 and inserts in latest_updates the pair (A, T_A1).

The information in the Storage ((A, T_A2)) and in latest_updates ((A, T_A1)) are no longer coherent.


This commit fixes this issue by only releasing the lock on the latest_updates structure after having updated it — instead of after checking if the received operation is more recent.

Note that this should create very little additional contention: the bulk of the processing is performed when interacting with the Storage, not when checking / updating the latest_updates structure.

  • plugins/zenoh-plugin-storage-manager/src/replica/storage.rs: only release the lock on the latest_updates structure after having updated it.

@J-Loudet J-Loudet requested a review from JEnoch September 11, 2024 11:06
Copy link

PR missing one of the required labels: {'internal', 'dependencies', 'bug', 'breaking-change', 'new feature', 'documentation', 'enhancement'}

@J-Loudet J-Loudet added the bug Something isn't working label Sep 11, 2024
@J-Loudet J-Loudet force-pushed the fix/storage-manager/race-condition-latest-updates branch from 12b2075 to 143ac71 Compare September 11, 2024 13:40
As the lock on the `latest_updates` structure was dropped before the
operation on the Storage was performed and re-acquired afterwards, a
race condition could have happened as illustrated in the following
scenario.

---

Let us assume that the Storage and `latest_updates` have `(A, T_A0)`.

A first operation is performed with `T_A1 > T_A0`. As the timestamp is
greater, `latest_updates` will allow it. The lock on `latest_updates` is
released.

The Storage is updated and has `(A, T_A1)`.

In the meantime, another operation is received with `T_A2 > T_A0`. As
the timestamp is greater than `T_A0` (`latest_updates` was still not
updated!), it is allowed.

For scheduling reasons, the second operation is performed *entirely*:
the Storage and `latest_updates` both have `(A, T_A2)`.

The scheduler goes back to finish `T_A1` and inserts in `latest_updates`
the pair `(A, T_A1)`.

The information in the Storage (`(A, T_A2)`) and in
`latest_updates` (`(A, T_A1)`) are no longer coherent.

---

This commit fixes this issue by only releasing the lock on the
`latest_updates` structure after having updated it — instead of after
checking if the received operation is more recent.

Note that this should create very little additional contention: the bulk
of the processing is performed when interacting with the Storage, not
when checking / updating the `latest_updates` structure.

* plugins/zenoh-plugin-storage-manager/src/replica/storage.rs: only
  release the lock on the `latest_updates` structure after having
  updated it.

Signed-off-by: Julien Loudet <[email protected]>
@J-Loudet J-Loudet force-pushed the fix/storage-manager/race-condition-latest-updates branch from 143ac71 to 2bc10a2 Compare September 11, 2024 13:40
@J-Loudet J-Loudet requested a review from JEnoch September 11, 2024 13:52
@JEnoch JEnoch merged commit 31b3183 into main Sep 11, 2024
23 checks passed
fuzzypixelz pushed a commit that referenced this pull request Sep 23, 2024
fuzzypixelz pushed a commit that referenced this pull request Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants