Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover plugin to automate recovery in CLN #6853

Merged
merged 6 commits into from
Feb 16, 2024

Conversation

adi2011
Copy link
Collaborator

@adi2011 adi2011 commented Nov 8, 2023

This would help users detect that they've lost some data and would automatically try to recover from SCB or peer storage.
Read #6544 for more!

plugins/recover.c Outdated Show resolved Hide resolved
Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the general approach, but we also need to have an idea when gossipd thinks it is synced (probably in getinfo). If we are on regtest, I would simply consider us always up-to-date, but for mainnet you could use some heuristic like "more than 1000 channels and the seeker is in state NORMAL" maybe?

lightningd/notification.c Outdated Show resolved Hide resolved
gossipd/gossipd.c Outdated Show resolved Hide resolved
@adi2011 adi2011 marked this pull request as ready for review December 11, 2023 10:49
@adi2011 adi2011 requested a review from cdecker as a code owner December 11, 2023 10:49
@adi2011 adi2011 requested a review from rustyrussell December 12, 2023 04:07
Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far! Minor nitpicks...

doc/schemas/listpeerchannels.schema.json Outdated Show resolved Hide resolved
lightningd/channel.h Outdated Show resolved Hide resolved
plugins/libplugin.c Outdated Show resolved Hide resolved
plugins/libplugin.h Outdated Show resolved Hide resolved
plugins/recover.c Outdated Show resolved Hide resolved
plugins/recover.c Show resolved Hide resolved
plugins/recover.c Show resolved Hide resolved
plugins/recover.c Outdated Show resolved Hide resolved
plugins/recover.c Outdated Show resolved Hide resolved
plugins/recover.c Outdated Show resolved Hide resolved
@adi2011 adi2011 force-pushed the RecoverPlugin branch 3 times, most recently from 7e3a4b4 to b42e775 Compare January 16, 2024 14:12
@cdecker cdecker self-assigned this Jan 16, 2024
@cdecker cdecker added this to the v24.02 milestone Jan 16, 2024
@adi2011 adi2011 force-pushed the RecoverPlugin branch 7 times, most recently from da38b09 to 6cbf8f5 Compare January 22, 2024 18:16
Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, but test fails for some reason?

tests/test_misc.py Outdated Show resolved Hide resolved

# l2.daemon.wait_for_log(r'All outputs resolved.*')
wait_for(lambda: l2.rpc.listfunds()["channels"][0]["state"] == "ONCHAIN")
wait_for(lambda: l2.rpc.listfunds()["channels"][1]["state"] == "ONCHAIN")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failing here: seems like l2 only has one channel?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It passes locally, but fails here for some reason. I am trying to resolve it...

@adi2011 adi2011 force-pushed the RecoverPlugin branch 5 times, most recently from 52b73bf to ad35815 Compare February 1, 2024 07:53
@cdecker
Copy link
Member

cdecker commented Feb 2, 2024

Hi @adi2011, I noticed that one of the failing test suites is the fuzz tester. That one has picked up some memory leak in a test binary that is unrelated to your PR. If you'd like to remove the fuzz testing from the PR CI runs, you can rebase on master.

See #7029 for more information :-)

@adi2011
Copy link
Collaborator Author

adi2011 commented Feb 2, 2024

Hi @cdecker, I've rebased on master. But it'll throw error which is rectified in #7031.

@cdecker cdecker removed this from the v24.02 milestone Feb 13, 2024
@cdecker
Copy link
Member

cdecker commented Feb 13, 2024

Remvoed milestone as this has a crash bug, and we need to move on with the release process. Happy to re-add it if fixed in the next RC.

@adi2011 adi2011 force-pushed the RecoverPlugin branch 13 times, most recently from da547bc to df36ca9 Compare February 16, 2024 04:28
@adi2011
Copy link
Collaborator Author

adi2011 commented Feb 16, 2024

The CI error is fixed here #7080

@cdecker
Copy link
Member

cdecker commented Feb 16, 2024

Rebased on top of master and hopefully we can line it up for inclusion into v24.02 anyway.

…'d help us identify if we've fall behind or lost some state.
… and try to recover the node by entering mode.
…ected nodes on the network and call emergency_recover immediately.
…orage and then call restorefrompeer repeatedly.
… being recovered when we lose some state and enter recover mode.
@cdecker
Copy link
Member

cdecker commented Feb 16, 2024

@adi2011 notice that if you force-push we have to run all the tests again, whereas if we just restart we can just rerun the failing cases. Let me take care of CI and this will be merged soon.

@cdecker cdecker merged commit cdb0001 into ElementsProject:master Feb 16, 2024
36 checks passed
@adi2011
Copy link
Collaborator Author

adi2011 commented Feb 16, 2024

Finally!! 🎉🎉❤️

@cdecker cdecker mentioned this pull request Jun 13, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants