Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit adds limit to the amount of gossips a node will store in memory at any one time. This allows nodes to be resilient against acquiring too many stale node profiles from the network.
This issue definitively fixes #1599 and makes the processing requirements of Jormungandr deterministic in relation to the total number of network profiles being gossiped in the network.
This is what we see before the change. The processing requirements are increasing relative to the number of profiles being processed into p2p overlays.
Nodes will become so bogged down that they start to miss block production (see #1580). Eventually they fall too far behind the tip of the main chain, resulting in complete de-sync. Longer term, nodes crash completely due to expended resources.
Here is the result after this change with default max peer size set to 10000.
In this case, my CPU usage never climbs above 2% and the node never de-syncs from the network.
A follow up to this issue will allow the maximum number of stored node profiles to be configured. Though 10000 as a default for the time being seems to be more than sufficient for most nodes.
This solution doesn't solve the underlying issue of nodes forever being gossiped on the network. To solve this will probably require a protocol level enhancement (i.e. including liveness timestamp in propagated gossip). However, the fix also does not detract from the poldercast specification. It simply accepts the resource limitations of the node.