Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cygthread: suspend thread before terminating. #234

Merged
merged 1 commit into from
Nov 13, 2024

Conversation

jeremyd2019
Copy link
Member

@jeremyd2019 jeremyd2019 commented Nov 13, 2024

This addresses an extremely difficult to debug deadlock when running under emulation on ARM64.

A relatively easy way to trigger this bug is to call fork(), then within the child process immediately call another fork() and then exit() the intermediate process.

It would seem that there is a "code emulation" lock on the wait thread at this stage, and if the thread is terminated too early, that lock still exists albeit without a thread, and nothing moves forward.

It seems that a SuspendThread() combined with a GetThreadContext() (to force the thread to actually be suspended, for more details see https://devblogs.microsoft.com/oldnewthing/20150205-00/?p=44743) makes sure the thread is "booted" from emulation before it is suspended.

Hopefully this means it won't be holding any locks or otherwise leave emulation in a bad state when the thread is terminated.

Also, attempt to use CancelSynchonousIo() (as seen in flock.cc) to avoid the need for TerminateThread() altogether. This doesn't always work, however, so was not a complete fix for the deadlock issue.

Addresses: https://cygwin.com/pipermail/cygwin-developers/2024-May/012694.html

Fixes msys2/msys2-autobuild#62, fixes #228 (and some other issues scattered about I don't remember off-hand)

Awaiting review on cygwin-patches mailing list: https://inbox.sourceware.org/cygwin-patches/[email protected]/T/#u

This addresses an extremely difficult to debug deadlock when running
under emulation on ARM64.

A relatively easy way to trigger this bug is to call `fork()`, then within the
child process immediately call another `fork()` and then `exit()` the
intermediate process.

It would seem that there is a "code emulation" lock on the wait thread at
this stage, and if the thread is terminated too early, that lock still exists
albeit without a thread, and nothing moves forward.

It seems that a `SuspendThread()` combined with a `GetThreadContext()`
(to force the thread to _actually_ be suspended, for more details see
https://devblogs.microsoft.com/oldnewthing/20150205-00/?p=44743)
makes sure the thread is "booted" from emulation before it is suspended.

Hopefully this means it won't be holding any locks or otherwise leave
emulation in a bad state when the thread is terminated.

Also, attempt to use `CancelSynchonousIo()` (as seen in `flock.cc`) to avoid
the need for `TerminateThread()` altogether.  This doesn't always work,
however, so was not a complete fix for the deadlock issue.

Addresses: https://cygwin.com/pipermail/cygwin-developers/2024-May/012694.html
Signed-off-by: Jeremy Drake <[email protected]>
@lazka
Copy link
Member

lazka commented Nov 13, 2024

-> msys2/MSYS2-packages#5007

lazka added a commit to msys2/MSYS2-packages that referenced this pull request Nov 13, 2024
@lazka
Copy link
Member

lazka commented Nov 13, 2024

This sadly results in lots of errors being printed (w11 26100):

user@desktop MSYS ~
$ pacman -Su
      0 [waitproc] pacman 1159 proc_waiter: error on read of child wait pipe 0x58C, Win32 error 995
                                                                                                         0 [waitproc] pacman 1167 proc_waiter: error on read of child wait pipe 0x590, Win32 error 995
                                    0 [waitproc] pacman 1169 proc_waiter: error on read of child wait pipe 0x5C0, Win32 error 995
                                                                                                                                       0 [waitproc] pacman 1171 proc_waiter: error on read of child wait pipe 0x5E4, Win32 error 995
                                                                  0 [waitproc] pacman 1173 proc_waiter: error on read of child wait pipe 0x5F0, Win32 error 995
                                                                                                                                                                     0 [waitproc] pacman 1175 proc_waiter: error on read of child wait pipe 0x5E4, Win32 error 995
                                                                                                0 [waitproc] pacman 1177 proc_waiter: error on read of child wait pipe 0x5F4, Win32 error 995
                           0 [waitproc] pacman 1179 proc_waiter: error on read of child wait pipe 0x5EC, Win32 error 995
                                                                                                                              0 [waitproc] pacman 1181 proc_waiter: error on read of child wait pipe 0x5F4, Win32 error 995
                                                         0 [waitproc] pacman 1183 proc_waiter: error on read of child wait pipe 0xB0, Win32 error 995
                                                                                                                                                           0 [waitproc] pacman 1187 proc_waiter: error on read of child wait pipe 0x5EC, Win32 error 995
                                                                                      0 [waitproc] pacman 1189 proc_waiter: error on read of child wait pipe 0x5E4, Win32 error 995
                 0 [waitproc] pacman 1193 proc_waiter: error on read of child wait pipe 0x5F4, Win32 error 995
                                                                                                                    0 [waitproc] pacman 1195 proc_waiter: error on read of child wait pipe 0x5F4, Win32 error 995
                                         :: Starting core system upgrade...
 there is nothing to do
:: Starting full system upgrade...
 there is nothing to do

@jeremyd2019
Copy link
Member Author

jeremyd2019 commented Nov 13, 2024

D'oh. That error is expected due to CancelSynchronousIo

C:\>net helpmsg 995

The I/O operation has been aborted because of either a thread exit or an application request.

I wonder why I didn't see that when I tested this on ARM64

@jeremyd2019

This comment was marked as outdated.

jeremyd2019 added a commit to jeremyd2019/winautoconfig that referenced this pull request Nov 14, 2024
After msys2/msys2-runtime#234, we shouldn't hit msys2/msys2-autobuild#62
anymore, so remove the workarounds for it.  Also remove the sed for
enabling clangarm64 in pacman.conf, since it has been enabled by default
for a couple of years now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants