[ODS-6319] Modify retry policy for deadlocks to wrap just the NHibernate persistence operations #1013

gmcelhanon · 2024-04-01T23:24:10Z

No description provided.

axelmarquezh

With these changes, we will no longer execute the whole pipeline when a deadlock occurs, meaning that some scenarios will produce a different result.

For example, 2 clients try to insert a resource with the same id; for whatever reason, one client gets deadlocked, so it retries. Before this change, the pipeline would trigger an update; after this change, it will fail with a duplicate key violation because it retries the insert.

Another example is when a client tries to update a resource that another client deleted. Before this change, it would return a 404 on the retry; after this change, it will return a 204.

I'm not sure how often deadlocks happen in the field, and what scenarios cause them, so your change might be beneficial for most of the scenarios that actually happen.

Thoughts?

...ation/EdFi.Ods.Tests/EdFi.Ods.Common/Infrastructure/Repositories/DeadlockRetryPolicyTests.cs

…plementations. # Conflicts: # Application/EdFi.Ods.Api/EdFi.Ods.Api.csproj # Application/EdFi.Ods.Common/EdFi.Ods.Common.csproj # Application/EdFi.Ods.Common/Infrastructure/Repositories/CreateEntity.cs # Application/EdFi.Ods.Common/Infrastructure/Repositories/UpdateEntity.cs

gmcelhanon · 2024-04-25T21:11:35Z

With these changes, we will no longer execute the whole pipeline when a deadlock occurs, meaning that some scenarios will produce a different result.

For example, 2 clients try to insert a resource with the same id; for whatever reason, one client gets deadlocked, so it retries. Before this change, the pipeline would trigger an update; after this change, it will fail with a duplicate key violation because it retries the insert.

Another example is when a client tries to update a resource that another client deleted. Before this change, it would return a 404 on the retry; after this change, it will return a 204.

I'm not sure how often deadlocks happen in the field, and what scenarios cause them, so your change might be beneficial for most of the scenarios that actually happen.

Thoughts?

I think I would characterize what you are describing here as race conditions rather than database deadlocks. The scenario this code is explicitly handling is the latter. Deadlocks occur when two different transactions are executing and they are each waiting on the other to release locks on resources and the database engine detects this and arbitrarily kills one of the transactions, which results in a very specific type of exception.

I think the scenarios you're describing above would mostly likely surface as regular SQL exceptions rather than deadlocks (e.g. INSERT failed due to attempt to insert a duplicate key, or NHibernate detecting that the UPDATE failed because no records were modified).

As for why we're seeing deadlocks (with TEA's implementation), there's still more work to be done to analyze that. There's a separate ticket to drop the indexes from the authorization views we have in SQL Server because this has been known to cause problems: https://dba.stackexchange.com/questions/151049/how-to-resolve-sql-server-deadlocks-involving-concurrent-inserts/151442#151442

Since I believe that we're dealing with scenarios where the deadlocks are happening because of locks on related resources, and not directly related to the scenarios you listed above, it is better to just retry the current database operation than it would be to go back through the whole "upsert" process and run all the queries to load the resource's data back into memory, perform the separate authorization round-trip query again, and finally proceed with (probably) the exact same persistence operation again. In all likelihood, this just represents a bunch of redundant work that increases the load on the server at a time when we know it is struggling with the volume of transactions being executed.

Does that answer satisfy your concerns?

axelmarquezh requested changes Apr 19, 2024

View reviewed changes

...ation/EdFi.Ods.Tests/EdFi.Ods.Common/Infrastructure/Repositories/DeadlockRetryPolicyTests.cs Outdated Show resolved Hide resolved

...ation/EdFi.Ods.Tests/EdFi.Ods.Common/Infrastructure/Repositories/DeadlockRetryPolicyTests.cs Outdated Show resolved Hide resolved

gmcelhanon added 3 commits April 25, 2024 12:03

Added unit test coverage of the deadlock retry policy.

23a586e

Whitespace cleanup

9e38039

Fixed unit tests around the deadlock retry policy.

f1763c4

gmcelhanon force-pushed the ODS-6319 branch from 57aa0b4 to f1763c4 Compare April 25, 2024 21:22

axelmarquezh approved these changes Apr 25, 2024

View reviewed changes

axelmarquezh merged commit 13765ad into main Apr 26, 2024
16 checks passed

axelmarquezh deleted the ODS-6319 branch April 26, 2024 00:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ODS-6319] Modify retry policy for deadlocks to wrap just the NHibernate persistence operations #1013

[ODS-6319] Modify retry policy for deadlocks to wrap just the NHibernate persistence operations #1013

gmcelhanon commented Apr 1, 2024

axelmarquezh left a comment

gmcelhanon commented Apr 25, 2024 •

edited

Loading

[ODS-6319] Modify retry policy for deadlocks to wrap just the NHibernate persistence operations #1013

[ODS-6319] Modify retry policy for deadlocks to wrap just the NHibernate persistence operations #1013

Conversation

gmcelhanon commented Apr 1, 2024

axelmarquezh left a comment

Choose a reason for hiding this comment

gmcelhanon commented Apr 25, 2024 • edited Loading

gmcelhanon commented Apr 25, 2024 •

edited

Loading