From 14737a408c435b8c66b1f04da017859a177cacb1 Mon Sep 17 00:00:00 2001 From: 0o-de-lally <1364012+0o-de-lally@users.noreply.github.com> Date: Sun, 19 Dec 2021 14:38:59 -0500 Subject: [PATCH] v5.0.6 (#907) * changelog * text --- ol/changelog/5_0_6.md | 69 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) create mode 100644 ol/changelog/5_0_6.md diff --git a/ol/changelog/5_0_6.md b/ol/changelog/5_0_6.md new file mode 100644 index 0000000000..1e146c0900 --- /dev/null +++ b/ol/changelog/5_0_6.md @@ -0,0 +1,69 @@ +## 5.0.6 + +This is an upgrade with patches mitigate an ongoing network incident. More details here: https://hackmd.io/KY7VklgAR72Bg7ETTQZAxA + +These changes do not affect Move code (no changes to system state or policies). It only upgrades `diem-node` + +#### TL;DR + +Just update binaries to use patches. + +``` + +# install new binaries +cd ~/libra +git checkout v5.0.6 -f +make bins install + + + +``` + +### Summary + +TBD + +### Changes + +##### Move Changes +None + +##### Rust Changes +##### - prevent DiemAccount::1003 txs from remaining in mempool + +One line change to prevent the VM transaction verification on admission from allowing transactions from clients that are more advanced than the current state of validator. The change prevents a DiemAccount::1003 PROLOGUE_ESEQUENCE_NUMBER_TOO_NEW transaction error from remaining in the mempool. + +This happens when a TX has a sequence number newer than the validator has for that account. This means the validator is behind sync compared to the client. + +Possible fix to inundation of prologue errors DiemAccount::1003 during the Dec 16-18 network incident described here https://hackmd.io/KY7VklgAR72Bg7ETTQZAxA. + +It seems txs with prologue errors keep accumulating but they in mempool and then keep getting rebroadcast to other nodes that were disconnected (related PR: diem/diem#9437). This seems to contribute network to halt. But it's still unclear why this would cause transport errors, and eventually Validators from contacting each other. cc @gregnazario + +Tip from user shb +https://discord.com/channels/903339070925721652/916100220855648296/921932021884928030 +"One idea for a mitigation: try setting this flag to false (which should tell mempool to drop any txs with new sequence #'s on the floor instead of holding on to them) https://github.com/OLSF/libra/blob/main/language/diem-vm/src/diem_transaction_validator.rs#L91 + +https://github.com/OLSF/libra/pull/906 + +##### - disable mempool resends all txns after disconnect + +Validators don't resend txns after disconnect to ones that have already +confirmed which could cause some issues when many nodes restart or get +stuck. + +This patch already exists in diem Upstream in a newer version (1.4) than 0L's fork (1.3). The upstream PR is https://github.com/diem/diem/pull/9437 + +https://github.com/OLSF/libra/pull/905 + + +##### Tests + +- All continuous integration tests passed. +- The diem-code was tested on production nodes that were both synced and out of sync. + +##### Compatibility +The Move framework changes are backwards compatible with `diem-node` from v5.0.1 + +### Rust changes +#### Diem Node +Changes to `diem-node` is compatible with on-chain state from `v5.0.5`.