You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The other day we saw our competition endpoint stop showing data for recent transactions. This was due to our settlement event indexing code failing because the same settlement transaction showed up in two different blocks according to our database. The block prior to the one etherscan shows now had been re-orged, our settlement had been removed from it and was included in the next block instead. In the database the same settlement tx hash then showed up associated with two different blocks (settlements table) causing a subquery to fail since there was more than one row to return.
Impact
Settlement indexing which is used for debugging and rewards calculation stopped, record had to be manually removed.
To reproduce
Write an integration test for the autopilot that uses anvil to simulate reorg behavior.
Mine settlement in block x
revert chain to block x-1
mine a block without settlement (block x)
mine settlement in block x+1
Expected behaviour
Reorg detection deletes the first settlement in favor of the second settlement
We replace events in the range that was returned by self.past_events_by_block_hashes. This however only returns the blocks that contain events according to the reorged chain (in the example above block x+1). The previous settlement is stored in block x however. This means we don't replace the previous event and only add the new event, unless the settlement has been reorged to the same block (if reorged to a prior block I believe we will also miss it, but this doesn't really happen in practice).
I believe the fix would be to replace events on the old block range (latest_blocks.first() -> latest_blocks.last()). This may cause issues with partial updates, which I'm not sure are used in practice. Better tests of this code is needed to clarify.
The text was updated successfully, but these errors were encountered:
Problem
The other day we saw our competition endpoint stop showing data for recent transactions. This was due to our settlement event indexing code failing because the same settlement transaction showed up in two different blocks according to our database. The block prior to the one etherscan shows now had been re-orged, our settlement had been removed from it and was included in the next block instead. In the database the same settlement tx hash then showed up associated with two different blocks (settlements table) causing a subquery to fail since there was more than one row to return.
Impact
Settlement indexing which is used for debugging and rewards calculation stopped, record had to be manually removed.
To reproduce
Write an integration test for the autopilot that uses anvil to simulate reorg behavior.
Expected behaviour
Reorg detection deletes the first settlement in favor of the second settlement
Screenshots/logs
https://production-6de61f.kb.eu-central-1.aws.cloud.es.io/app/r/s/GKc86 l
Additional context
I believe the offending code is here:
services/crates/shared/src/event_handling.rs
Lines 336 to 357 in 148a922
We replace events in the range that was returned by
self.past_events_by_block_hashes
. This however only returns the blocks that contain events according to the reorged chain (in the example above block x+1). The previous settlement is stored in block x however. This means we don't replace the previous event and only add the new event, unless the settlement has been reorged to the same block (if reorged to a prior block I believe we will also miss it, but this doesn't really happen in practice).I believe the fix would be to replace events on the old block range (latest_blocks.first() -> latest_blocks.last()). This may cause issues with partial updates, which I'm not sure are used in practice. Better tests of this code is needed to clarify.
The text was updated successfully, but these errors were encountered: