Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional hang due to deadlock when HALSimWS connection is present while SimDevice is being created. #6842

Closed
brettle opened this issue Jul 16, 2024 · 7 comments · Fixed by #6855
Labels
component: hal Hardware Abstraction Layer os: simulation type: bug Something isn't working.

Comments

@brettle
Copy link
Contributor

brettle commented Jul 16, 2024

Describe the bug

If the HALSimWS server extension is enabled and a client connects connection is present while the robot is creating a SimDevice, a deadlock can occur.

To Reproduce

Due to the race condition involved, it might not be possible to reproduce this reliably, but see below for stack traces showing the deadlock.

Steps to reproduce the behavior:

  1. Enable the HALSimWS server extension.
  2. In the robot code, after a HALSimWS connection is present, create a SimDevice. (To increase the chances of triggering the issue, it might help to create many SimDevices and to create them after some delay to ensure that a HALSimWS client has connected.)
  3. While a SimDevice is being created, connect a HALSimWS client.

Expected behavior
The code should continue executing normally (and the device should be created and be visible to the HALSimWS client).

Desktop (please complete the following information):

  • OS: Fedora Linux Workstation 40
  • Project Information:
    WPILib Information:
    Project Version: unknown
    VS Code Version: 1.85.1
    WPILib Extension Version: 2024.3.1
    C++ Extension Version: 1.19.1
    Java Extension Version: 1.32.0
    Java Debug Extension Version: 0.57.2024041008
    Java Dependencies Extension Version 0.23.6
    Java Version: 17
    Java Location: /home/brettle/wpilib/2024/jdk
    Vendor Libraries:

Additional context

After a deadlock occured, I attached GDB to the java process. Here is the stack trace for the thread attempting to create the SimDevice. It appears to be blocking waiting for the AsyncFunction mutex while running HALSimWS's createDevice callback. Note that at this point it is holding a lock on the SimDeviceData mutex that it acquired in SimDeviceData::CreateDevice

libc.so.6!futex_wait(unsigned int * futex_word, unsigned int expected, int private) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/sysdeps/nptl/futex-internal.h:146)
libc.so.6!__GI___lll_lock_wait(int * futex, int * futex@entry, int private) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/lowlevellock.c:49)
libc.so.6!lll_mutex_lock_optimized(pthread_mutex_t * mutex) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/pthread_mutex_lock.c:48)
libc.so.6!___pthread_mutex_lock(pthread_mutex_t * mutex) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/pthread_mutex_lock.c:93)
libhalsim_ws_server.so!__gthread_mutex_lock(__gthread_mutex_t * __mutex) (/usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749)
libhalsim_ws_server.so!std::mutex::lock(class std::mutex * const this) (/usr/include/c++/11/bits/std_mutex.h:100)
libhalsim_ws_server.so!std::scoped_lock<std::mutex>::scoped_lock(std::scoped_lock<std::mutex>::mutex_type & __m, class std::scoped_lock<std::mutex> * const this) (/usr/include/c++/11/mutex:655)
libhalsim_ws_server.so!wpi::uv::AsyncFunction<void(std::function<void()>)>::Call<wpilibws::HALSimWSProviderSimDevices::DeviceCreatedCallback(char const*, HAL_SimDeviceHandle)::<lambda()> >(class wpi::uv::AsyncFunction<void(std::function<void()>)> * const this) (/work/wpinet/src/main/native/include/wpinet/uv/AsyncFunction.h:145)
libhalsim_ws_server.so!wpilibws::HALSimWSProviderSimDevices::DeviceCreatedCallback(class wpilibws::HALSimWSProviderSimDevices * const this, const char * name, HAL_SimDeviceHandle handle) (/work/simulation/halsim_ws_core/src/main/native/cpp/WSProvider_SimDevice.cpp:277)
libwpiHal.so!hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)>::Invoke<int&>(const char * name, const class hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)> * const this) (/work/hal/src/main/native/sim/mockdata/SimDeviceDataInternal.h:123)
libwpiHal.so!hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)>::operator()<char const*&, int&>(const class hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)> * const this) (/work/hal/src/main/native/sim/mockdata/SimDeviceDataInternal.h:131)
libwpiHal.so!hal::SimDeviceData::CreateDevice(class hal::SimDeviceData * const this, const char * name) (/work/hal/src/main/native/sim/mockdata/SimDeviceData.cpp:114)
libwpiHaljni.so!Java_edu_wpi_first_hal_SimDeviceJNI_createSimDevice(JNIEnv * env, jstring name) (/work/wpiutil/src/main/native/thirdparty/llvm/include/wpi/SmallVector.h:273)
[Unknown/Just-In-Time compiled code] (Unknown Source:0)

The AsyncFunction mutex that the above thread is waiting for is locked by the HALSimWS server thread which is running the DeviceCreated callback it registered during it's initialization. It appears to be blocked waiting for the SimDeviceData mutex held by the earlier thread. Here is its stack trace:

libwpiHal.so!wpi::recursive_spinlock1::try_lock(wpi::recursive_spinlock1 * const this) (/work/wpiutil/src/main/native/include/wpi/spinlock.h:56)
libwpiHal.so!wpi::recursive_spinlock1::lock(wpi::recursive_spinlock1 * const this) (/work/wpiutil/src/main/native/include/wpi/spinlock.h:71)
libwpiHal.so!std::scoped_lock<wpi::recursive_spinlock1>::scoped_lock(std::scoped_lock<wpi::recursive_spinlock1>::mutex_type & __m, std::scoped_lock<wpi::recursive_spinlock1> * const this) (/usr/include/c++/11/mutex:655)
libwpiHal.so!hal::SimDeviceData::RegisterValueCreatedCallback(hal::SimDeviceData * const this, HAL_SimDeviceHandle device, void * param, HALSIM_SimValueCallback callback, bool initialNotify) (/work/hal/src/main/native/sim/mockdata/SimDeviceData.cpp:346)
libhalsim_ws_server.so!wpilibws::HALSimWSProviderSimDevice::OnNetworkConnected(wpilibws::HALSimWSProviderSimDevice * const this, std::shared_ptr<wpilibws::HALSimBaseWebSocketConnection> ws) (/work/simulation/halsim_ws_core/src/main/native/cpp/WSProvider_SimDevice.cpp:37)
libhalsim_ws_server.so!operator()(const struct {...} * const __closure) (/work/simulation/halsim_ws_core/src/main/native/cpp/WSProvider_SimDevice.cpp:277)
libhalsim_ws_server.so!std::__invoke_impl<void, wpilibws::HALSimWSProviderSimDevices::DeviceCreatedCallback(char const*, HAL_SimDeviceHandle)::<lambda()>&>(struct {...} & __f) (/usr/include/c++/11/bits/invoke.h:61)
libhalsim_ws_server.so!std::__invoke_r<void, wpilibws::HALSimWSProviderSimDevices::DeviceCreatedCallback(char const*, HAL_SimDeviceHandle)::<lambda()>&>(struct {...} & __fn) (/usr/include/c++/11/bits/invoke.h:111)
libhalsim_ws_server.so!std::_Function_handler<void(), wpilibws::HALSimWSProviderSimDevices::DeviceCreatedCallback(char const*, HAL_SimDeviceHandle)::<lambda()> >::_M_invoke(const std::_Any_data &)(const std::_Any_data & __functor) (/usr/include/c++/11/bits/std_function.h:290)
libhalsim_ws_server.so!std::function<void ()>::operator()() const(const std::function<void()> * const this) (/usr/include/c++/11/bits/std_function.h:590)
libhalsim_ws_server.so!operator()<wpi::promise<void> >(wpilibws::HALSimWSProviderSimDevices::LoopFn func, wpi::promise<void> out) (/work/simulation/halsim_ws_core/src/main/native/cpp/WSProvider_SimDevice.cpp:293)
libhalsim_ws_server.so!std::__invoke_impl<void, wpilibws::HALSimWSProviderSimDevices::Initialize(wpi::uv::Loop&)::<lambda(auto:31, wpilibws::HALSimWSProviderSimDevices::LoopFn)>&, wpi::promise<void>, std::function<void()> >(struct {...} & __f) (/usr/include/c++/11/bits/invoke.h:61)
libhalsim_ws_server.so!std::__invoke_r<void, wpilibws::HALSimWSProviderSimDevices::Initialize(wpi::uv::Loop&)::<lambda(auto:31, wpilibws::HALSimWSProviderSimDevices::LoopFn)>&, wpi::promise<void>, std::function<void()> >(struct {...} & __fn) (/usr/include/c++/11/bits/invoke.h:111)
libhalsim_ws_server.so!std::_Function_handler<void(wpi::promise<void>, std::function<void()>), wpilibws::HALSimWSProviderSimDevices::Initialize(wpi::uv::Loop&)::<lambda(auto:31, wpilibws::HALSimWSProviderSimDevices::LoopFn)> >::_M_invoke(const std::_Any_data &, wpi::promise<void> &&, std::function<void()> &&)(const std::_Any_data & __functor, wpi::promise<void> && __args#0, std::function<void()> && __args#1) (/usr/include/c++/11/bits/std_function.h:290)
libhalsim_ws_server.so!std::function<void (wpi::promise<void>, std::function<void ()>)>::operator()(wpi::promise<void>, std::function<void ()>) const(std::function<void()> __args#1, wpi::promise<void> __args#0, const std::function<void(wpi::promise<void>, std::function<void()>)> * const this) (/usr/include/c++/11/bits/std_function.h:590)
libhalsim_ws_server.so!std::__invoke_impl<void, std::function<void (wpi::promise<void>, std::function<void ()>)>&, wpi::promise<void>, std::function<void ()> >(std::__invoke_other, std::function<void (wpi::promise<void>, std::function<void ()>)>&, wpi::promise<void>&&, std::function<void ()>&&)(std::function<void(wpi::promise<void>, std::function<void()>)> & __f) (/usr/include/c++/11/bits/invoke.h:61)
libhalsim_ws_server.so!std::__invoke<std::function<void (wpi::promise<void>, std::function<void ()>)>&, wpi::promise<void>, std::function<void ()> >(std::function<void (wpi::promise<void>, std::function<void ()>)>&, wpi::promise<void>&&, std::function<void ()>&&)(std::function<void(wpi::promise<void>, std::function<void()>)> & __fn) (/usr/include/c++/11/bits/invoke.h:96)
libhalsim_ws_server.so!std::__apply_impl<std::function<void (wpi::promise<void>, std::function<void ()>)>&, std::tuple<wpi::promise<void>, std::function<void ()> >, 0ul, 1ul>(std::function<void (wpi::promise<void>, std::function<void ()>)>&, std::tuple<wpi::promise<void>, std::function<void ()> >&&, std::integer_sequence<unsigned long, 0ul, 1ul>)(std::tuple<wpi::promise<void>, std::function<void()> > && __t, std::function<void(wpi::promise<void>, std::function<void()>)> & __f) (/usr/include/c++/11/tuple:1854)
libhalsim_ws_server.so!std::apply<std::function<void (wpi::promise<void>, std::function<void ()>)>&, std::tuple<wpi::promise<void>, std::function<void ()> > >(std::function<void (wpi::promise<void>, std::function<void ()>)>&, std::tuple<wpi::promise<void>, std::function<void ()> >&&)(std::tuple<wpi::promise<void>, std::function<void()> > && __t, std::function<void(wpi::promise<void>, std::function<void()>)> & __f) (/usr/include/c++/11/tuple:1865)
libhalsim_ws_server.so!wpi::uv::AsyncFunction<void (std::function<void ()>)>::Create(std::shared_ptr<wpi::uv::Loop> const&, std::function<void (wpi::promise<void>, std::function<void ()>)>)::{lambda(uv_async_s*)#1}::operator()(uv_async_s*) const(uv_async_t * handle) (/work/wpinet/src/main/native/include/wpinet/uv/AsyncFunction.h:95)
libhalsim_ws_server.so!wpi::uv::AsyncFunction<void (std::function<void ()>)>::Create(std::shared_ptr<wpi::uv::Loop> const&, std::function<void (wpi::promise<void>, std::function<void ()>)>)::{lambda(uv_async_s*)#1}::_FUN(uv_async_s*)() (/work/wpinet/src/main/native/include/wpinet/uv/AsyncFunction.h:84)
libwpinet.so!uv__async_io(uv_loop_t * loop, uv__io_t * w, unsigned int events) (/work/wpinet/src/main/native/thirdparty/libuv/src/unix/async.cpp:177)
libwpinet.so!uv__io_poll(uv_loop_t * loop, uv_loop_t * loop@entry, int timeout) (/work/wpinet/src/main/native/thirdparty/libuv/src/unix/linux.cpp:1527)
libwpinet.so!uv_run(uv_loop_t * loop, uv_run_mode mode, uv_run_mode mode@entry) (/work/wpinet/src/main/native/thirdparty/libuv/src/unix/core.cpp:448)
libwpinet.so!wpi::uv::Loop::Run(wpi::uv::Loop::Mode mode, wpi::uv::Loop * const this) (/work/wpinet/src/main/native/include/wpinet/uv/Loop.h:113)
libwpinet.so!wpi::EventLoopRunner::Thread::Main(wpi::EventLoopRunner::Thread * const this) (/work/wpinet/src/main/native/cpp/EventLoopRunner.cpp:36)
libwpiutil.so!operator()(const struct {...} * const __closure) (/work/wpiutil/src/main/native/cpp/SafeThread.cpp:79)
libwpiutil.so!std::__invoke_impl<void, wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> >(struct {...} && __f) (/usr/include/c++/11/bits/invoke.h:61)
libwpiutil.so!std::__invoke<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> >(struct {...} && __fn) (/usr/include/c++/11/bits/invoke.h:96)
libwpiutil.so!std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > >::_M_invoke<0>(std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > * const this) (/usr/include/c++/11/bits/std_thread.h:259)
libwpiutil.so!std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > >::operator()(std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > * const this) (/usr/include/c++/11/bits/std_thread.h:266)
libwpiutil.so!std::thread::_State_impl<std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > >::_M_run(void)(std::thread::_State_impl<std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > > * const this) (/usr/include/c++/11/bits/std_thread.h:211)
libstdc++.so.6!execute_native_thread_routine (Unknown Source:0)
libc.so.6!start_thread(void * arg) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/pthread_create.c:447)
libc.so.6!clone3() (/usr/src/debug/glibc-2.39-17.fc40.x86_64/sysdeps/unix/sysv/linux/x86_64/clone3.S:78)

In short, this appears to be an issue of lock ordering inversion.

As a side note: I'm probably missing something, but it doesn't seem like AsyncFunction should need it's own mutex if all of the calls are made from the same thread/loop. If they aren't, what is the reasoning behind that?

@PeterJohnson
Copy link
Member

AsyncFunction needs a mutex because its entire purpose is to signal the loop thread from some other thread, and several member variables are modified on both caller thread(s) and the loop thread.

It does look like it should have a recursive_mutex to handle this case (or not hold the lock during the std::apply call, but that's trickier to get right).

@brettle
Copy link
Contributor Author

brettle commented Jul 16, 2024

A recursive_mutex would only help if there was only one thread involved in the deadlock, right? Don't the stack traces above I posted indicate that there are 2 threads involved?

@PeterJohnson
Copy link
Member

PeterJohnson commented Jul 16, 2024

Good point, yeah, it must be a lock inversion issue. In which case we need to release the lock when running the callback to prevent it (either in AsyncFunction or in SimDevice). It's a little unclear if the same AsyncFunction is being used in both threads, but that's the only thing that makes sense.

From the stack trace, it looks like thread A has the following locks held, trying to lock AsyncFunction.m_mutex:
SimDeviceData.m_mutex

and Thread B has the following locks held, trying to lock SimDeviceData.m_mutex
HALSimWSProviderSimDevice.m_ws
AsyncFunction.m_mutex

@brettle
Copy link
Contributor Author

brettle commented Jul 16, 2024

Here's a different pair of stack traces for a different lock inversion deadlock created in the same way:

Thread calling SimDevice.create() is waiting on ProviderContainer mutex while holding SimDeviceData mutex.

libc.so.6!__futex_abstimed_wait_common(unsigned int * futex_word, unsigned int * futex_word@entry, unsigned int expected, unsigned int expected@entry, clockid_t clockid, clockid_t clockid@entry, const struct timespec * abstime, const struct timespec * abstime@entry, int private, int private@entry, _Bool cancel, _Bool cancel@entry) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/futex-internal.c:103)
libc.so.6!__GI___futex_abstimed_wait64(unsigned int * futex_word, unsigned int * futex_word@entry, unsigned int expected, unsigned int expected@entry, clockid_t clockid, clockid_t clockid@entry, const struct timespec * abstime, const struct timespec * abstime@entry, int private, int private@entry) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/futex-internal.c:128)
libc.so.6!__pthread_rwlock_wrlock_full64(pthread_rwlock_t * rwlock, clockid_t clockid, const struct timespec * abstime) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/pthread_rwlock_common.c:829)
libc.so.6!___pthread_rwlock_wrlock(pthread_rwlock_t * rwlock) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/pthread_rwlock_wrlock.c:26)
libhalsim_ws_server.so!std::__glibcxx_rwlock_wrlock(pthread_rwlock_t * __rwlock) (/usr/include/c++/11/shared_mutex:80)
libhalsim_ws_server.so!std::__shared_mutex_pthread::lock(class std::__shared_mutex_pthread * const this) (/usr/include/c++/11/shared_mutex:193)
libhalsim_ws_server.so!std::shared_mutex::lock(class std::shared_mutex * const this) (/usr/include/c++/11/shared_mutex:420)
libhalsim_ws_server.so!std::unique_lock<std::shared_mutex>::lock(class std::unique_lock<std::shared_mutex> * const this) (/usr/include/c++/11/bits/unique_lock.h:139)
libhalsim_ws_server.so!std::unique_lock<std::shared_mutex>::unique_lock(std::unique_lock<std::shared_mutex>::mutex_type & __m, class std::unique_lock<std::shared_mutex> * const this) (/usr/include/c++/11/bits/unique_lock.h:69)
libhalsim_ws_server.so!wpilibws::ProviderContainer::Add(class wpilibws::ProviderContainer * const this, std::string_view key, class std::shared_ptr<wpilibws::HALSimWSBaseProvider> provider) (/home/brettle/git/allwpilib/simulation/halsim_ws_core/src/main/native/include/WSProviderContainer.h:31)
libhalsim_ws_server.so!wpilibws::HALSimWSProviderSimDevices::DeviceCreatedCallback(class wpilibws::HALSimWSProviderSimDevices * const this, const char * name, HAL_SimDeviceHandle handle) (/usr/include/c++/11/string_view:137)
libwpiHal.so!hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)>::Invoke<int&>(const char * name, const class hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)> * const this) (/home/brettle/git/allwpilib/hal/src/main/native/sim/mockdata/SimDeviceDataInternal.h:123)
libwpiHal.so!hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)>::operator()<char const*&, int&>(const class hal::impl::SimPrefixCallbackRegistry<void (*)(char const*, void*, int)> * const this) (/home/brettle/git/allwpilib/hal/src/main/native/sim/mockdata/SimDeviceDataInternal.h:131)
libwpiHal.so!hal::SimDeviceData::CreateDevice(class hal::SimDeviceData * const this, const char * name) (/home/brettle/git/allwpilib/hal/src/main/native/sim/mockdata/SimDeviceData.cpp:114)
libwpiHaljni.so!Java_edu_wpi_first_hal_SimDeviceJNI_createSimDevice(JNIEnv * env, jstring name) (/home/brettle/git/allwpilib/wpiutil/src/main/native/thirdparty/llvm/include/wpi/SmallVector.h:273)
[Unknown/Just-In-Time compiled code] (Unknown Source:0)

Thread processing new HALSimWS client connection is waiting on SimDeviceData mutex while holding ProviderContainer mutex :

libwpiHal.so!wpi::recursive_spinlock1::try_lock(wpi::recursive_spinlock1 * const this) (/home/brettle/git/allwpilib/wpiutil/src/main/native/include/wpi/spinlock.h:56)
libwpiHal.so!wpi::recursive_spinlock1::lock(wpi::recursive_spinlock1 * const this) (/home/brettle/git/allwpilib/wpiutil/src/main/native/include/wpi/spinlock.h:71)
libwpiHal.so!std::scoped_lock<wpi::recursive_spinlock1>::scoped_lock(std::scoped_lock<wpi::recursive_spinlock1>::mutex_type & __m, std::scoped_lock<wpi::recursive_spinlock1> * const this) (/usr/include/c++/11/mutex:655)
libwpiHal.so!hal::SimDeviceData::RegisterValueCreatedCallback(hal::SimDeviceData * const this, HAL_SimDeviceHandle device, void * param, HALSIM_SimValueCallback callback, bool initialNotify) (/home/brettle/git/allwpilib/hal/src/main/native/sim/mockdata/SimDeviceData.cpp:346)
libhalsim_ws_server.so!wpilibws::HALSimWSProviderSimDevice::OnNetworkConnected(wpilibws::HALSimWSProviderSimDevice * const this, std::shared_ptr<wpilibws::HALSimBaseWebSocketConnection> ws) (/home/brettle/git/allwpilib/simulation/halsim_ws_core/src/main/native/cpp/WSProvider_SimDevice.cpp:37)
libhalsim_ws_server.so!operator()(const struct {...} * const __closure) (/home/brettle/git/allwpilib/simulation/halsim_ws_server/src/main/native/cpp/HALSimWeb.cpp:143)
libhalsim_ws_server.so!std::__invoke_impl<void, wpilibws::HALSimWeb::RegisterWebsocket(std::shared_ptr<wpilibws::HALSimBaseWebSocketConnection>)::<lambda(std::shared_ptr<wpilibws::HALSimWSBaseProvider>)>&, std::shared_ptr<wpilibws::HALSimWSBaseProvider> >(struct {...} & __f) (/usr/include/c++/11/bits/invoke.h:61)
libhalsim_ws_server.so!std::__invoke_r<void, wpilibws::HALSimWeb::RegisterWebsocket(std::shared_ptr<wpilibws::HALSimBaseWebSocketConnection>)::<lambda(std::shared_ptr<wpilibws::HALSimWSBaseProvider>)>&, std::shared_ptr<wpilibws::HALSimWSBaseProvider> >(struct {...} & __fn) (/usr/include/c++/11/bits/invoke.h:111)
libhalsim_ws_server.so!std::_Function_handler<void(std::shared_ptr<wpilibws::HALSimWSBaseProvider>), wpilibws::HALSimWeb::RegisterWebsocket(std::shared_ptr<wpilibws::HALSimBaseWebSocketConnection>)::<lambda(std::shared_ptr<wpilibws::HALSimWSBaseProvider>)> >::_M_invoke(const std::_Any_data &, std::shared_ptr<wpilibws::HALSimWSBaseProvider> &&)(const std::_Any_data & __functor, std::shared_ptr<wpilibws::HALSimWSBaseProvider> && __args#0) (/usr/include/c++/11/bits/std_function.h:290)
libhalsim_ws_server.so!std::function<void (std::shared_ptr<wpilibws::HALSimWSBaseProvider>)>::operator()(std::shared_ptr<wpilibws::HALSimWSBaseProvider>) const(std::shared_ptr<wpilibws::HALSimWSBaseProvider> __args#0, const std::function<void(std::shared_ptr<wpilibws::HALSimWSBaseProvider>)> * const this) (/usr/include/c++/11/bits/std_function.h:590)
libhalsim_ws_server.so!wpilibws::ProviderContainer::ForEach(std::function<void (std::shared_ptr<wpilibws::HALSimWSBaseProvider>)>)(wpilibws::ProviderContainer * const this, wpilibws::ProviderContainer::IterFn fn) (/home/brettle/git/allwpilib/simulation/halsim_ws_core/src/main/native/include/WSProviderContainer.h:43)
libhalsim_ws_server.so!wpilibws::HALSimWeb::RegisterWebsocket(wpilibws::HALSimWeb * const this, std::shared_ptr<wpilibws::HALSimBaseWebSocketConnection> hws) (/home/brettle/git/allwpilib/simulation/halsim_ws_server/src/main/native/cpp/HALSimWeb.cpp:142)
libhalsim_ws_server.so!operator()<wpi::sig::Connection, std::basic_string_view<char> >(const struct {...} * const __closure) (/usr/include/c++/11/bits/shared_ptr_base.h:731)
libhalsim_ws_server.so!wpi::sig::detail::Slot<wpilibws::HALSimHttpConnection::ProcessWsUpgrade()::<lambda(auto:33, auto:34)>, wpi::sig::trait::typelist<wpi::sig::Connection&, std::basic_string_view<char, std::char_traits<char> > > >::call_slot(std::basic_string_view<char, std::char_traits<char> >)(wpi::sig::detail::Slot<wpilibws::HALSimHttpConnection::ProcessWsUpgrade()::<lambda(auto:33, auto:34)>, wpi::sig::trait::typelist<wpi::sig::Connection&, std::basic_string_view<char, std::char_traits<char> > > > * const this, std::basic_string_view<char, std::char_traits<char> > args#0) (/home/brettle/git/allwpilib/wpiutil/src/main/native/thirdparty/sigslot/include/wpi/Signal.h:349)
libhalsim_ws_server.so!wpi::sig::detail::SlotBase<std::basic_string_view<char, std::char_traits<char> > >::operator()<std::basic_string_view<char, std::char_traits<char> >&>(wpi::sig::detail::SlotBase<std::basic_string_view<char, std::char_traits<char> > > * const this) (/home/brettle/git/allwpilib/wpiutil/src/main/native/thirdparty/sigslot/include/wpi/Signal.h:311)
libhalsim_ws_server.so!wpi::sig::detail::SlotBase<std::basic_string_view<char, std::char_traits<char> > >::operator()<std::basic_string_view<char, std::char_traits<char> >&>(wpi::sig::detail::SlotBase<std::basic_string_view<char, std::char_traits<char> > > * const this) (/home/brettle/git/allwpilib/wpiutil/src/main/native/thirdparty/sigslot/include/wpi/Signal.h:311)
libhalsim_ws_server.so!wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::CallSlots::operator()<std::basic_string_view<char, std::char_traits<char> > >(wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::CallSlots * const this) (/home/brettle/git/allwpilib/wpiutil/src/main/native/thirdparty/sigslot/include/wpi/Signal.h:514)
libhalsim_ws_server.so!std::__invoke_impl<void, wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::CallSlots&, std::basic_string_view<char, std::char_traits<char> > >(wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::CallSlots & __f) (/usr/include/c++/11/bits/invoke.h:61)
libhalsim_ws_server.so!std::__invoke_r<void, wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::CallSlots&, std::basic_string_view<char, std::char_traits<char> > >(wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::CallSlots & __fn) (/usr/include/c++/11/bits/invoke.h:111)
libhalsim_ws_server.so!std::_Function_handler<void (std::basic_string_view<char, std::char_traits<char> >), wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::CallSlots>::_M_invoke(std::_Any_data const&, std::basic_string_view<char, std::char_traits<char> >&&)(const std::_Any_data & __functor, std::basic_string_view<char, std::char_traits<char> > && __args#0) (/usr/include/c++/11/bits/std_function.h:290)
libwpinet.so!std::function<void (std::basic_string_view<char, std::char_traits<char> >)>::operator()(std::basic_string_view<char, std::char_traits<char> >) const(std::basic_string_view<char, std::char_traits<char> > __args#0, const std::function<void(std::basic_string_view<char, std::char_traits<char> >)> * const this) (/usr/include/c++/11/bits/std_function.h:586)
libwpinet.so!wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > >::operator()<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>(const wpi::sig::SignalBase<wpi::sig::detail::NullMutex, std::basic_string_view<char, std::char_traits<char> > > * const this) (/home/brettle/git/allwpilib/wpiutil/src/main/native/thirdparty/sigslot/include/wpi/Signal.h:573)
libwpinet.so!operator()<std::span<wpi::uv::Buffer> >(std::span<wpi::uv::Buffer, 18446744073709551615> bufs, const struct {...} * const __closure) (/home/brettle/git/allwpilib/wpinet/src/main/native/cpp/WebSocket.cpp:382)
libwpinet.so!std::__invoke_impl<void, wpi::WebSocket::StartServer(std::string_view, std::string_view, std::string_view)::<lambda(auto:24, wpi::uv::Error)>&, std::span<wpi::uv::Buffer, 18446744073709551615>, wpi::uv::Error>(struct {...} & __f) (/usr/include/c++/11/bits/invoke.h:61)
libwpinet.so!std::__invoke_r<void, wpi::WebSocket::StartServer(std::string_view, std::string_view, std::string_view)::<lambda(auto:24, wpi::uv::Error)>&, std::span<wpi::uv::Buffer, 18446744073709551615>, wpi::uv::Error>(struct {...} & __fn) (/usr/include/c++/11/bits/invoke.h:111)
libwpinet.so!std::_Function_handler<void(std::span<wpi::uv::Buffer, 18446744073709551615>, wpi::uv::Error), wpi::WebSocket::StartServer(std::string_view, std::string_view, std::string_view)::<lambda(auto:24, wpi::uv::Error)> >::_M_invoke(const std::_Any_data &, std::span<wpi::uv::Buffer, 18446744073709551615> &&, wpi::uv::Error &&)(const std::_Any_data & __functor, std::span<wpi::uv::Buffer, 18446744073709551615> && __args#0, wpi::uv::Error && __args#1) (/usr/include/c++/11/bits/std_function.h:290)
libwpinet.so!std::function<void (std::span<wpi::uv::Buffer, 18446744073709551615ul>, wpi::uv::Error)>::operator()(std::span<wpi::uv::Buffer, 18446744073709551615ul>, wpi::uv::Error) const(wpi::uv::Error __args#1, std::span<wpi::uv::Buffer, 18446744073709551615> __args#0, const std::function<void(std::span<wpi::uv::Buffer, 18446744073709551615>, wpi::uv::Error)> * const this) (/usr/include/c++/11/bits/std_function.h:590)
libwpinet.so!operator()(wpi::uv::Error err, const struct {...} * const __closure) (/home/brettle/git/allwpilib/wpinet/src/main/native/cpp/uv/Stream.cpp:21)
libwpinet.so!std::__invoke_impl<void, (anonymous namespace)::CallbackWriteReq::CallbackWriteReq(std::span<const wpi::uv::Buffer>, std::function<void(std::span<wpi::uv::Buffer>, wpi::uv::Error)>)::<lambda(wpi::uv::Error)>&, wpi::uv::Error>(struct {...} & __f) (/usr/include/c++/11/bits/invoke.h:61)
libwpinet.so!std::__invoke_r<void, (anonymous namespace)::CallbackWriteReq::CallbackWriteReq(std::span<const wpi::uv::Buffer>, std::function<void(std::span<wpi::uv::Buffer>, wpi::uv::Error)>)::<lambda(wpi::uv::Error)>&, wpi::uv::Error>(struct {...} & __fn) (/usr/include/c++/11/bits/invoke.h:111)
libwpinet.so!std::_Function_handler<void(wpi::uv::Error), (anonymous namespace)::CallbackWriteReq::CallbackWriteReq(std::span<const wpi::uv::Buffer>, std::function<void(std::span<wpi::uv::Buffer>, wpi::uv::Error)>)::<lambda(wpi::uv::Error)> >::_M_invoke(const std::_Any_data &, wpi::uv::Error &&)(const std::_Any_data & __functor, wpi::uv::Error && __args#0) (/usr/include/c++/11/bits/std_function.h:290)
libwpinet.so!std::function<void (wpi::uv::Error)>::operator()(wpi::uv::Error) const(wpi::uv::Error __args#0, const std::function<void(wpi::uv::Error)> * const this) (/usr/include/c++/11/bits/std_function.h:586)
libwpinet.so!wpi::sig::SignalBase<wpi::sig::detail::NullMutex, wpi::uv::Error>::operator()<wpi::uv::Error>(const wpi::sig::SignalBase<wpi::sig::detail::NullMutex, wpi::uv::Error> * const this) (/home/brettle/git/allwpilib/wpiutil/src/main/native/thirdparty/sigslot/include/wpi/Signal.h:573)
libwpinet.so!operator()(uv_write_t * r, const struct {...} * const __closure, int status) (/home/brettle/git/allwpilib/wpinet/src/main/native/cpp/uv/Stream.cpp:130)
libwpinet.so!_FUN() (/home/brettle/git/allwpilib/wpinet/src/main/native/cpp/uv/Stream.cpp:131)
libwpinet.so!uv__write_callbacks(uv_stream_t * stream, uv_stream_t * stream@entry) (/home/brettle/git/allwpilib/wpinet/src/main/native/thirdparty/libuv/src/unix/stream.cpp:926)
libwpinet.so!uv__stream_io(uv_loop_t * loop, uv__io_t * w, unsigned int events) (/home/brettle/git/allwpilib/wpinet/src/main/native/thirdparty/libuv/src/unix/stream.cpp:1228)
libwpinet.so!uv__run_pending(uv_loop_t * loop, uv_loop_t * loop@entry) (/home/brettle/git/allwpilib/wpinet/src/main/native/thirdparty/libuv/src/unix/core.cpp:850)
libwpinet.so!uv_run(uv_loop_t * loop, uv_run_mode mode, uv_run_mode mode@entry) (/home/brettle/git/allwpilib/wpinet/src/main/native/thirdparty/libuv/src/unix/core.cpp:453)
libwpinet.so!wpi::uv::Loop::Run(wpi::uv::Loop::Mode mode, wpi::uv::Loop * const this) (/home/brettle/git/allwpilib/wpinet/src/main/native/include/wpinet/uv/Loop.h:113)
libwpinet.so!wpi::EventLoopRunner::Thread::Main(wpi::EventLoopRunner::Thread * const this) (/home/brettle/git/allwpilib/wpinet/src/main/native/cpp/EventLoopRunner.cpp:36)
libwpiutil.so!operator()(const struct {...} * const __closure) (/home/brettle/git/allwpilib/wpiutil/src/main/native/cpp/SafeThread.cpp:79)
libwpiutil.so!std::__invoke_impl<void, wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> >(struct {...} && __f) (/usr/include/c++/11/bits/invoke.h:61)
libwpiutil.so!std::__invoke<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> >(struct {...} && __fn) (/usr/include/c++/11/bits/invoke.h:96)
libwpiutil.so!std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > >::_M_invoke<0>(std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > * const this) (/usr/include/c++/11/bits/std_thread.h:259)
libwpiutil.so!std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > >::operator()(std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > * const this) (/usr/include/c++/11/bits/std_thread.h:266)
libwpiutil.so!std::thread::_State_impl<std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > >::_M_run(void)(std::thread::_State_impl<std::thread::_Invoker<std::tuple<wpi::detail::SafeThreadOwnerBase::Start(std::shared_ptr<wpi::SafeThreadBase>)::<lambda()> > > > * const this) (/usr/include/c++/11/bits/std_thread.h:211)
libstdc++.so.6!execute_native_thread_routine (Unknown Source:0)
libc.so.6!start_thread(void * arg) (/usr/src/debug/glibc-2.39-17.fc40.x86_64/nptl/pthread_create.c:447)
libc.so.6!clone3() (/usr/src/debug/glibc-2.39-17.fc40.x86_64/sysdeps/unix/sysv/linux/x86_64/clone3.S:78)

@PeterJohnson
Copy link
Member

My sense is that the best fix will be to fix SimDeviceData to not hold its mutex during callbacks.

brettle added a commit to DeepBlueRobotics/DeepBlueSim that referenced this issue Jul 16, 2024
Should ensure a reliable workaround of wpilibsuite/allwpilib#6842.

Also refactor some connection closing logic.
@brettle brettle changed the title Occasional hang due to deadlock when HALSimWS connection is received while SimDevice is being created. Occasional hang due to deadlock when HALSimWS connection is ~~received~~ present while SimDevice is being created. Jul 17, 2024
@brettle brettle changed the title Occasional hang due to deadlock when HALSimWS connection is ~~received~~ present while SimDevice is being created. Occasional hang due to deadlock when HALSimWS connection is present while SimDevice is being created. Jul 17, 2024
@brettle
Copy link
Contributor Author

brettle commented Jul 17, 2024

Further investigation indicates that the first deadlock above appears to be occurring sometime after the HALSimWS connection has been established, so I've updated the title and comment to reflect that. This also makes the issue harder to workaround because it means that one can't just delay creating SimDevices until after any HALSimWS connection has been made.

Side note: Is there any chance of this getting fixed in a future 2024.x release or will there not be any more 2024.x releases because the season is over? (Also, presumably this should be marked as a bug.)

@PeterJohnson PeterJohnson added type: bug Something isn't working. os: simulation component: hal Hardware Abstraction Layer labels Jul 17, 2024
@PeterJohnson
Copy link
Member

No, we will not be making any more 2024.x releases. Our next release will be for 2025 beta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: hal Hardware Abstraction Layer os: simulation type: bug Something isn't working.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants