Skale admin doesn't restart schain monitoring if monitoring hangs #1044

oleksandrSydorenkoJ · 2024-02-07T13:48:50Z

Describe the bug
In Skale Admin, there are two processes for each chain - one monitor is responsible for the status of contracts (creating chains, rotating ERC), and the second monitor checks the status of each Skaled container. Sometimes, the chain monitoring process hangs, ceases to initiate checks, and, in case of issues within the Skaled container, fails to perform the specified actions for recovery or restarting of such a Skaled.

Preconditions:
16 nodes
8 active chains
2 GETH nodes for 8 nodes each
Disabled auto_repair option in container config stream

Versions:
skalenetwork/admin:2.5.4-develop.1

To Reproduce
Clear steps for reproduction are missing.
To investigate, it is necessary for a failure to occur and to increase the log size for Skale-admin on QANet

Expected behavior
Skale admin should run 2 monitors after Geth nodes are synced

Actual state:
In some cases, skale admin initiates only contract monitoring, but does not restart the schain monitoring if it hangs.

Temporary workaround:
Restart the Skale admin container manually

PolinaKiporenko · 2024-02-07T16:32:53Z

investigate during 2.3 release

PolinaKiporenko · 2024-04-16T11:31:50Z

Temporary workaround - repair with restart admin container

yatsunastya · 2024-11-01T18:46:29Z

Checked ✅
Version: 2.8.0-beta.1

As it's not clear how to verify this ticket, it was decided to check that main basic functionality isn't broken.

oleksandrSydorenkoJ added the bug Something isn't working label Feb 7, 2024

oleksandrSydorenkoJ assigned badrogger Feb 7, 2024

oleksandrSydorenkoJ added this to SKALE Engineering 🚀 Feb 7, 2024

PolinaKiporenko modified the milestone: SKALE 2.3 Feb 7, 2024

DmytroNazarenko modified the milestones: SKALE 2.3, SKALE 2.4 Feb 7, 2024

PolinaKiporenko moved this to Ready For Pickup in SKALE Engineering 🚀 Feb 21, 2024

PolinaKiporenko modified the milestones: SKALE 2.4, SKALE 2.5 Apr 9, 2024

badrogger moved this from Ready For Pickup to In Progress in SKALE Engineering 🚀 Jun 4, 2024

badrogger moved this from In Progress to Ready For Pickup in SKALE Engineering 🚀 Jul 18, 2024

badrogger moved this from Ready For Pickup to In Progress in SKALE Engineering 🚀 Jul 19, 2024

badrogger linked a pull request Aug 30, 2024 that will close this issue

Fix stuck monitor recovery. Avoid DB related deadlocks. #1098

Merged

PolinaKiporenko closed this as completed Oct 21, 2024

github-project-automation bot moved this from Code Review to Ready For Release Candidate in SKALE Engineering 🚀 Oct 21, 2024

PolinaKiporenko moved this from Ready For Release Candidate to Merged To Release Candidate in SKALE Engineering 🚀 Oct 21, 2024

EvgeniyZZ moved this from QA to Done in SKALE Engineering 🚀 Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skale admin doesn't restart schain monitoring if monitoring hangs #1044

Skale admin doesn't restart schain monitoring if monitoring hangs #1044

oleksandrSydorenkoJ commented Feb 7, 2024 •

edited

Loading

PolinaKiporenko commented Feb 7, 2024

PolinaKiporenko commented Apr 16, 2024

yatsunastya commented Nov 1, 2024

Skale admin doesn't restart schain monitoring if monitoring hangs #1044

Skale admin doesn't restart schain monitoring if monitoring hangs #1044

Comments

oleksandrSydorenkoJ commented Feb 7, 2024 • edited Loading

PolinaKiporenko commented Feb 7, 2024

PolinaKiporenko commented Apr 16, 2024

yatsunastya commented Nov 1, 2024

oleksandrSydorenkoJ commented Feb 7, 2024 •

edited

Loading