Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding IsReady to dialer interface #1436

Merged
merged 42 commits into from
Dec 9, 2024
Merged

adding IsReady to dialer interface #1436

merged 42 commits into from
Dec 9, 2024

Conversation

WendelHime
Copy link
Contributor

@WendelHime WendelHime commented Nov 6, 2024

This pull request aim to fix the issues brought by this comment. Now we should be able to ignore dialers when they're not ready to be called

@WendelHime WendelHime marked this pull request as ready for review November 6, 2024 18:34
Copy link
Contributor

@garmr-ulfr garmr-ulfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looked like it was tedious having to go add isReady to every dialer! Good start though. There are a couple of bugs with dialerAvailable; I think getlantern/eventual would fix them and might even simplify everything a bit. Your way works too! Just an FYI if you weren't already aware of it. It's really useful and deals with a lot of the boilerplate of waiting on values with a timeout.

bypass/bypass.go Outdated
Comment on lines 97 to 105
go func() {
for {
time.Sleep(60 * time.Second)
if dialer.IsReady() {
b.startProxy(name, config, configDir, userConfig, dialer)
break
}
}
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have a way to stop this in the event a dialer never becomes ready or becomes ready after new proxies are already assigned. It's kind of unlikely a dialer is never ready, but if someone (probably me) forgets/unaware this goroutine is created, they might not add a timeout to the dialer themselves.

Maybe a chan or ctx that gets closed/canceled in Reset and is recreated if we need to wait for dialers again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm thanks for the suggestion! I'll make IsReady return an error too, so if err is nil and ok is false then it should try again.

Copy link
Contributor Author

@WendelHime WendelHime Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this way we can also customize when to retry to start a proxy or call the func to create a implementation again depending of the returned error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will still suffer from the same problem though; It's not guaranteed that isReady will eventually return an it's ready or an error. If, at some point, we were to update water to retry downloading the WASM when it fails, or add a new protocol that might not be ready immediately, the caller can't assume that the dialer itself isn't running an infinite loop. Or there might be a bug where isReady never returns true or a non nil error. The goroutines won't ever be cleaned up, leading to memory leaks that we may not catch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm it makes sense, thanks catching this! I've updated the bypass code:

  1. Added a go routine function loadProxyAsync for creating a context that timeout after 10 min.
  2. The function also provides an atomic boolean for the internal go routine that actually check if the proxy is ready or not. So the loadProxyAsync waits until the context timeout or it receives a channel signal saying it's ready. When ready the internal go routine should stop looping with the flag marked as false and the ready channel is closed.

chained/water_impl_test.go Outdated Show resolved Hide resolved
chained/water_impl_test.go Outdated Show resolved Hide resolved
chained/water_impl.go Outdated Show resolved Hide resolved
chained/water_impl.go Outdated Show resolved Hide resolved
chained/water_impl.go Outdated Show resolved Hide resolved
@WendelHime WendelHime requested a review from garmr-ulfr November 8, 2024 15:42
@WendelHime
Copy link
Contributor Author

Some changes have been implemented at this PR and fix a data race that was happening between bypass and bandit when initializing the WATER proxy. Both basically start at the same time and it was starting to download and truncating the downloaded data making one of them unusable, so I've added a package-level mutex to make sure it runs one at time.

Copy link
Contributor

@hwh33 hwh33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the synchronization here needs a little work, but I like the overall direction!

bandit/bandit.go Outdated Show resolved Hide resolved
bypass/bypass.go Outdated Show resolved Hide resolved
bypass/bypass.go Outdated Show resolved Hide resolved
bypass/bypass.go Outdated Show resolved Hide resolved
bypass/bypass.go Outdated Show resolved Hide resolved
chained/water_impl.go Outdated Show resolved Hide resolved
chained/water_impl.go Outdated Show resolved Hide resolved
chained/water_impl.go Outdated Show resolved Hide resolved
chained/water_version_control.go Outdated Show resolved Hide resolved
bandit/bandit.go Outdated
Comment on lines 366 to 367
// IsReady indicates when the dialer is ready for dialing
IsReady() (bool, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of complexity (in the bandit and bypass code) created by two things:

  1. The double return value here. The first value and second value each need to be checked independently before calling code can proceed to use the dialer.
  2. A lack of synchronization provided by this function. Callers are left having to poll this function. Since we're trying to communicate when an event happens (when the dialer is ready), this feels like a natural use case for a channel.

Maybe instead, something like:

// Ready returns a channel which will have a value on it only when initialization
// of the dialer is complete. If initialization failed, the channel will have a non-nil
// error value.
Ready() chan<- error

Then callers use this like:

select {
case err := <-dialer.Ready():
    if err != nil {
        return fmt.Errorf("dialer initialization failed: %w", err)
    }
case <-ctx.Done():
    return ctx.Error()
}
// Here we know the dialer is ready to use.

Your way definitely works, to be clear! I'm fine with that if you prefer it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm this is valid and would work perfectly on the bandit side, but on bypass, how do we know when we should start the proxy asynchronously? Should we start all proxies asynchronously?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm I'll return nil by default, if the chan is nil It's ready by default

@WendelHime WendelHime requested a review from hwh33 November 19, 2024 20:14
Copy link
Contributor

@hwh33 hwh33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looking good now, just a problem with the mutex map and a small note on use of the dialer.

dialer/dialer.go Outdated Show resolved Hide resolved
chained/water_impl.go Outdated Show resolved Hide resolved
dialer/bandit.go Outdated Show resolved Hide resolved
@WendelHime WendelHime requested a review from hwh33 November 26, 2024 20:44
Copy link
Contributor

@hwh33 hwh33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just noticed one more thing 😬

readyChan := d.Ready()
if readyChan != nil {
select {
case err := <-readyChan:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey sorry, I just realized: this only works once, right? Once someone reads from the ready channel, there will never again be a value on it.

I think the ready channel should probably only be used to signal readiness and should just be closed when the dialer is ready. Errors in initialization can be returned by the dial function.

Sorry I didn't catch this earlier!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ready channel should probably only be used to signal readiness and should just be closed when the dialer is ready.

Makes sense to me!

Errors in initialization can be returned by the dial function.

If we do this, the proxy will be punished by the bandit algorithm, should we still do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to create a logic for sending a response for all listeners

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've updated the ready logic from the water implementation.

Whenever we receive a ready() call for checking if the dialer is ready:

  1. We check if the dialer is ready, if it is we return a nil response
  2. We check if there was an error loading the WASM file.
    2.1 In this case, all other ready calls have already received an answer and they're closed, so this is a late party check
    2.2 In this case we return a temp channel and start a go func that send the error and close the channel.
  3. We create a channel, append to a internal list of channels and return to whoever is calling ready().

When we finish loading the WASM by either succeeding or failing, we broadcast the ready status to all channels. If the channel take more than one second for receiving the message, the channel is probably not being listened so it'll be closed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors in initialization can be returned by the dial function.

If we do this, the proxy will be punished by the bandit algorithm, should we still do it?

If there was an error in initialization then the proxy is dead in the water. In that case, we don't care if it gets punished by the bandit algorithm since it will error every time anyway (actually, we want it to get de-prioritized).

So a simply closed channel, with errors returned on dial attempts would work. What you've got works too though! Totally up to you.

chained/water_impl.go Outdated Show resolved Hide resolved
chained/water_impl.go Outdated Show resolved Hide resolved
readyChan := d.Ready()
if readyChan != nil {
select {
case err := <-readyChan:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors in initialization can be returned by the dial function.

If we do this, the proxy will be punished by the bandit algorithm, should we still do it?

If there was an error in initialization then the proxy is dead in the water. In that case, we don't care if it gets punished by the bandit algorithm since it will error every time anyway (actually, we want it to get de-prioritized).

So a simply closed channel, with errors returned on dial attempts would work. What you've got works too though! Totally up to you.

Copy link
Contributor

@hwh33 hwh33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of options for simplification, but this looks good to me!

@WendelHime
Copy link
Contributor Author

I've made an E2E test with this changes and I was able to run WATER smoothly with the newest flashlight changes. The multiplex is also working as expected so this is ready for any other reviews or ready to be merged

Copy link
Contributor

@hwh33 hwh33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @WendelHime! Feel free to merge whenever you're ready

@WendelHime WendelHime merged commit 30d5ac4 into main Dec 9, 2024
1 check passed
@WendelHime WendelHime deleted the feat/1442-is-ready branch December 9, 2024 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants