Indexer Causes Unclean Shutdown Race Condition #1387

crypto-services · 2024-12-12T04:46:59Z

System information

Bor client version: 0.5.1

Heimdall client version: 1.0.10

OS & Version: Ubuntu 22.04/24.04

Environment: Polygon Mainnet

Type of node: All

Overview of the problem

When attempting to shutdown the Bor service (SIGINT) the process appears unable to exit if the indexer was running at the time. This results in the SIGINT timing out triggering SIGKILL which either dumps the recent state or worse corrupts the db.

Reproduction Steps

Happens often when stopping the Bor service with SIGINT.

Logs / Traces / Output / Error Messages

Gracefully shutting down agent...
INFO [12-12|03:06:12.072] HTTP server stopped                      endpoint=127.0.0.1:8545
INFO [12-12|03:06:12.072] IPC endpoint closed                      url=/home/polygon/.bor/data/bor.ipc
INFO [12-12|03:06:12.072] Stats daemon stopped
INFO [12-12|03:06:12.073] Ethereum protocol stopped
INFO [12-12|03:06:12.073] Transaction pool stopped
INFO [12-12|03:06:12.074] Waiting background transaction indexer to exit
INFO [12-12|03:06:22.080] Looking for peers                        peercount=1 tried=135 static=5
INFO [12-12|03:06:32.085] Looking for peers                        peercount=1 tried=168 static=5
...

The node is not able to reach step Writing cached state to disk and is eventually killed. This does not happen when the indexer isn't running at the moment of shutdown.

The text was updated successfully, but these errors were encountered:

crypto-services · 2024-12-13T01:13:47Z

This has just occurred again while updating one of our sentries. We've updated 6 of our nodes in the past couple of days and it happened on 50% of them all with the same behaviour. We have 10 minutes timeout set for our service and we're deploying/updating with Ansible so there is no change to the process.

All print Waiting background transaction indexer to exit then Looking for peers for 10 minutes followed by Killing process 3706391 (bor) with signal SIGKILL. Writing cached state to disk never occurs which causes the node to loose unwritten blocks as per the image underneath.

marcello33 · 2024-12-19T06:30:09Z

Hey @crypto-services have you tried with a more recent version of bor?

crypto-services · 2024-12-23T01:33:47Z

Hey @crypto-services have you tried with a more recent version of bor?

The issue still appears to be present on v1.5.3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexer Causes Unclean Shutdown Race Condition #1387

Indexer Causes Unclean Shutdown Race Condition #1387

crypto-services commented Dec 12, 2024 •

edited

Loading

crypto-services commented Dec 13, 2024

marcello33 commented Dec 19, 2024

crypto-services commented Dec 23, 2024

Indexer Causes Unclean Shutdown Race Condition #1387

Indexer Causes Unclean Shutdown Race Condition #1387

Comments

crypto-services commented Dec 12, 2024 • edited Loading

System information

Overview of the problem

Reproduction Steps

Logs / Traces / Output / Error Messages

crypto-services commented Dec 13, 2024

marcello33 commented Dec 19, 2024

crypto-services commented Dec 23, 2024

crypto-services commented Dec 12, 2024 •

edited

Loading