Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iox #325 deadlock in shared mem mutex #330

Conversation

marthtz
Copy link
Contributor

@marthtz marthtz commented Nov 2, 2020

Introduce a robust, recursive mutex.
If recursive is enabled, the same thread, which has already locked the mutex, can lock the mutex again without getting blocked.
If robust is enabled, a process or thread can exit while having the lock without causing an invalid mutex. In the next lock call to this mutex the system recognizes the mutex is locked by a dead process or thread and allows the mutex to be restored.

Sumedha Maharudrayya Mathad (RBEI/ESU4) and others added 4 commits November 2, 2020 11:29
…case application fails to unlock mutex and terminate

Signed-off-by: Sumedha Maharudrayya Mathad (RBEI/ESU4) <[email protected]>
Signed-off-by: Hintz Martin (CC-AD/ESW1) <[email protected]>
@marthtz marthtz force-pushed the iox-#325-deadlock-in-shared-mem-mutex branch from 6dba6b2 to ecac6fd Compare November 2, 2020 10:30
@marthtz marthtz requested a review from elfenpiff November 2, 2020 11:04
@marthtz marthtz linked an issue Nov 2, 2020 that may be closed by this pull request
@marthtz
Copy link
Contributor Author

marthtz commented Nov 2, 2020

Robust mutex is not supported on Win/Mac. We need to find a solution!

@mossmaurice
Copy link
Contributor

Windows and Mac builds are broken :-( IMHO best solution would be to add empty wrappers for e.g. pthread_mutexattr_setrobust in the iceoryx_utils/platform/*/include/iceoryx_utils/platform/pthread.hpp

@budrus
Copy link
Contributor

budrus commented Nov 2, 2020

This will prevent a deadlock and makes the mutex consistent again.
But how to deal with the problem that the process that died while holding the mutex could leave some shared data objects in an inconsistent state. The locking will work now but depending on the critical section and the involved shared data objects this opens the road to undefined behavior.

What's your feeling on this @marthtz @sculpordwarf ?

@sculpordwarf
Copy link
Contributor

Windows: WAIT_ABANDONED
Mac: EDEADLK -> you need to make an example an step in with the debugger, there isn´t enough docu

@marthtz
Copy link
Contributor Author

marthtz commented Nov 3, 2020

What's your feeling on this @marthtz @sculpordwarf ?

You're right @budrus, we're getting into difficult terrain here. All shared data protected by an invalid mutex should be treated as corrupted. Off the top of my head we could do two things when an invalid mutex has been detected. We could either re-init these data blocks or we throw an error (and may shutdown gracefully).

Copy link
Contributor

@elfenpiff elfenpiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Fix Mac OS behavior so that is consistent with linux (if possible)
  • Find robust mutex solution for windows (if possible)
  • Add additional unit tests for mutex

/// @param[in] robust If robust is set ON, a process or thread can exit while having the lock without causing an
/// invalid mutex. In the next lock call to this mutex the system recognizes the mutex is locked by a dead process
/// or thread and allows the mutex to be restored.
mutex(const Recursive recursive, const Robust robust);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add noexcept

@@ -50,6 +50,16 @@ mutex::mutex(bool f_isRecursive)
&attr,
PTHREAD_PRIO_NONE)
.hasErrors();
if (robust == Robust::ON)
{
isInitialized &= !cxx::makeSmartC(pthread_mutexattr_setrobust,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this call is not supported for Mac OS and therefore has to be reimplemented and I am not aware if Windows is supporting robust mutexes?!

But maybe you can follow up this link: https://stackoverflow.com/questions/24946886/is-there-pthread-mutex-robust-equivalent-in-mac-os-x or contact me and we can work together on a solution.

@@ -47,7 +48,7 @@ class Mutex_test : public Test
}
}

iox::posix::mutex sut{false};
mutex_t sut{mutex_t::Recursive::OFF, mutex_t::Robust::OFF};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test cases for recursive and robust mutexes are missing.
Like:

  • normal creation and usage should work for recursive, robust and recursive and robust mutex
  • multiple locks should not lead to deadlock in a recursive mutex in a single thread
  • multiple locks should cause a deadlock in a recursive mutex from different threads (start multiple threads, let the first thread deadlock and then unlock the mutex in the thread it was locked before)
  • robust mutex cleanup should work ... here we maybe have to think how we can test it. @marthtz do you have an idea?

@marthtz
Copy link
Contributor Author

marthtz commented Nov 3, 2020

Isn't it the case that with the introduction of the building blocks / new API the number of mutexes is reduced?! So, maybe the robust approach is a bit over the top?

@budrus
Copy link
Contributor

budrus commented Nov 4, 2020

You're right @budrus, we're getting into difficult terrain here. All shared data protected by an invalid mutex should be treated as corrupted. Off the top of my head we could do two things when an invalid mutex has been detected. We could either re-init these data blocks or we throw an error (and may shutdown gracefully).

So we would have to provide this information and not silently make the mutex consistent again.

@budrus
Copy link
Contributor

budrus commented Nov 4, 2020

Isn't it the case that with the introduction of the building blocks / new API the number of mutexes is reduced?! So, maybe the robust approach is a bit over the top?

They are reduced but not completely eliminated

@mossmaurice mossmaurice added this to the v1.x milestone Nov 4, 2020
@budrus
Copy link
Contributor

budrus commented Nov 5, 2020

My proposal after some discussion with @sculpordwarf :

  • Windows and Mac. No support of robust mutex so far. We would fall back to the deadlock situation as we have today
  • We need the information whether a mutex was held by a dead application and the lock() call made it consistent again. The user of the mutex needs to be able to decide what the consequence is. I.e. if the data protected by the mutex could be inconsistent or hast to be reset etc. The idea is to introduce a was_inconsistent() method that can be called after lock
  • For the old ports we came to the conclusion that a robust mutex is fine for now. For the locking policy in the new building blocks we start with a non-robust mutex and first have to analyze if and how the robust mutex could help
  • For tests we think it should be possible to have a global mutex and several threads to force the interesting situations

@dkroenke dkroenke marked this pull request as draft January 7, 2021 17:04
@elfenpiff
Copy link
Contributor

@marthtz is this PR dead? Can we remove it?

@elfenpiff
Copy link
Contributor

@marthtz I assume that this PR is dead and remove it. If I am mistaken just reopen it.

@elfenpiff elfenpiff closed this Aug 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Solves a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock in shared mem mutex when apps are killed
5 participants