Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix node-validator's state becoming stale due to disconnected leaf stream #2051

Merged
merged 4 commits into from
Oct 7, 2024

Conversation

Ayiga
Copy link
Member

@Ayiga Ayiga commented Sep 23, 2024

Closes #2002

This PR:

Addresses an issue that occurs when the source Leaf Stream HotShot Query service becomes unavailable, resulting in the node-validator service ending up in a Stale state with no future updates being delivered.

This is implemented with connection re-attempts to acquire a new Leaf Stream should the previous one complete. The re-attempts themselves will loop to try and ensure a new Leaf Stream is returned with an exponential backkoff up to a maximum limit of 100 times, at which point it will panic as it seems that the service is not going to be coming back immediately.

@Ayiga Ayiga force-pushed the ts/fix/node-validator-leaf-stream-disconnect branch 3 times, most recently from f76a249 to 2eb46dc Compare September 30, 2024 13:07
@Ayiga Ayiga self-assigned this Oct 1, 2024
Ayiga added 4 commits October 3, 2024 11:14
…ream

The Leaf Stream that is sourced from the HotShot Query Service can becomes disconnected
due to a variety of issues that are not accounted for within the node-validator API.
These reasons can be varied, but they all have the same result, the failure to retrieve
a Stream, or the end of an existing Stream.

In either of these cases the node-validator will not attempt to re-establish the Leaf
Stream, resulting in the validator's state becoming stale over time.

In order to address this issue effectively, and ideally, changes have been made so
attempts to re-acquire the initial Leaf Stream state.  If this retrieval fails then
it will attempt again and again with an exponential backoff.  If the failure persists
for too long, the process will eventually give up and panic instead.
@Ayiga Ayiga force-pushed the ts/fix/node-validator-leaf-stream-disconnect branch from be1229a to 6ff63cf Compare October 3, 2024 17:14
@Ayiga Ayiga merged commit c7c61f4 into main Oct 7, 2024
15 checks passed
@Ayiga Ayiga deleted the ts/fix/node-validator-leaf-stream-disconnect branch October 7, 2024 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants