Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ensure executor doesn't deadlock when closure errors #152

Merged
merged 1 commit into from
Oct 20, 2024

Conversation

ifross89
Copy link

When running 'gomod2nix' on in my project, the 'gomod2nix import' was failing for every import. I have more imports than the default maxJobs.

This caused a deadlock and the program never finished, due to an issue in the ParallelExecutor.

This is because in the erroring case, we send to the errChan, which is a blocking channel. If this blocks then the defers are never called, most importantly the defer which pulls an entry off the semaphore (e.guard).

This means once the erroring work functions exceeds the numWorkers, we will block trying to acquire the semaphore when we call .Add with more work.

We never get to the point where we call .Wait(), which would drain the errChan becuase we are blocked on the semaphore whilst we are still generating work.

This change moves the semaphore acquire to within the goroutines themselves. This alters the behaviour in that we now will start as many goroutines as we have work items, but the work they do will still be gated by the semaphore.

This is reasonable behaviour: goroutines are cheap, in general this package is useful if the work the functions are doing is expensive not the goroutine creation itself. The work still is guarded by the semaphore.

There is also a regression test added and in passing, the spelling of Parallel is corrected.

When running 'gomod2nix' on in my project, the 'gomod2nix import' was
failing for every import. I have more imports than the default maxJobs.

This caused a deadlock and the program never finished.

This is because in the erroring case, we send to the errChan, which is a
blocking channel. If this blocks then the defers are never called, most
importantly the `defer` which pulls an entry off the semaphore
(e.guard).

This means once the erroring work functions exceeds the numWorkers, we
will block trying to acquire the semaphore when we call .Add with more
work.

We never get to the point where we call .Wait(), which would drain the
errChan becuase we are blocked on the semaphore whilst we are still
generating work.

This change moves the semaphore acquire to within the goroutines
themselves. This alters the behaviour in that we now will start as many
goroutines as we have work items, but the work they do will still be
gated by the semaphore.

This is reasonable behaviour: goroutines are cheap, in general this
package is useful if the work the functions are doing is expensive not
the goroutine creation itself. The work still is guarded by the
semaphore.

There is also a regression test added and in passing, the spelling of
Parallel is corrected.
@marcusramberg
Copy link
Collaborator

Thanks, I think this makes a lot of sense.

@marcusramberg marcusramberg merged commit 5d38709 into nix-community:master Oct 20, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants