-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recover: The recovery plugin crashes if a channel has not fully been forgotten #7162
Comments
My node ID is 038e5677dc7d1ec4e9a82bd004d524b2469fedb0867e85bd1e2f5d3352957e03b7 |
The first crashdump log file I am finding from today , crash.log.20240320221217 has ALOT of very large blocks of numbers after the entries jsonrpc#67IO_IN: And then I get this in the end of the log file
|
greping my log file for that channel that forced closed (Boltz) I see this
|
Gossipd, channel with a lost state. This is what stood out to me. I also don't know yet what caused the forced close. Maybe this is why lightningd won't start up? Did I lose my funds? |
Ping @adi2011: this appears to be an issue with the recovery plugin, attempting to recover a channel as if it were fully forgotten, but the CLN node "just" lost state. Generally speaking: if a node does not start, DO NOT tinker with the database unless told to. This includes up and downgrades. Don't worry the funds should be save, and we just need to get the node running again. I'll let @adi2011 guide you through the steps, but I'll note that we can get you unstuck by adding If you could take a snapshot of the DB (not the |
As for things currently in flight, only channel 29 is at risk of closure, as it is the only one with an HTLC attached. |
Thank you cdecker. High respect. I am at computer for the rest of the day. I have made a snapshot of my .lightning\bitcoin directory, I also started it up with the command .\lightningd --disable-plugin=recover (not recovery). After I get this resolved, I might continue operating a clightning node having peace of mind knowing what to do next time if this happens again. For that forced close chan 831793x2527x0 I see this now
and then an htcl direction in with state "RCVD_REMOVE_REVOCATION" Just let me know what I should check or try out now if you could. I'll follow your instructions. Thank you very much. |
For that forced close chan I do see cause set to "local" does that indicate that it was my node that initiated the forced close? So strange. Anyways, I am able to start up lightningd but only when the recover plugin is set to disabled. I have also since been using the --offline flag as well in caution. |
I think that already means we're past the problem, it was just the recover plugin crashing due to a misidentified missing channel. You should be able to just run the node normally now. |
What do you suggest? When I attempt to start it up normally again it crashes again like it did earlier. I can only start it up when I set recover plugin to disabled. |
I do have a backup of accounts.sqlite3 and emergency.recover from before these issues, if that is helpful at all. |
email from boltz:
I replaced parts with [REDACTED INFO] |
When I run ./lightning-cli listpeers I see
|
Update TL;DR summary = My lightning router eventually forgot this channel that was forced closed, in which I was then able to start up lightningd normally again. I still need help recovering funds from the unilateral close transaction I started up lightningd with the recover plugin disabled I then reconnected to the peer that forced closed the chan (boltz). I then saw my 8.3 million sats from the forced channel closure listed as "onchain". I restarted lightningd again with the recover plugin disabled and let it catch up with the blockchain. After catching up with the blockchain I restarted lightningd again with the recover plugin disabled, I then saw that the problem channel (that forced closed) was forgotten by lightning. I restarted lightningd again normally (not disabling the recover plugin). That problem channel is indeed forgotten by my node thus enabling lightningd to start up normally again. However what remains, I need to recover my sats from that forced channel closure. I see this when I type ./lightning-cli listfunds
How do I claim these though from that unilateral close? I do not see these funds in my wallet. Will lightning sweep these funds after some delayed output? Or are these funds lost because lightning lost the state of this channel? Is the delayed output the peers delayed output of I think 144 blocks or my delayed output of what I think was around 1400 blocks? |
I was able to withdraw all funds including the ones from the forced channel closure by using the command Not sure why it was not showing up in sparrow wallet, but my funds are returned. Core Lightning remains solid! If anyone experiences an issue like I did, the solution is to start up the node with the recover plugin disabled, use the listpeers command to look at the lost_state field to see which channel is causing the problems, reconnect to that node if it is disconnected. Restart lightningd again with the recover plugin disabled, let it catch up with the blockchain. Restart lightningd again with the recover plugin disabled to see if that problem channel is corrected or forgotten. Once the channel state is corrected or forgotten you should then be able to restart lightningd again normally without having to disable the recover plugin. All these steps might be unnecessary, I just am documenting the steps I took to resolve. Now, why this chan forced closed I don't know. Maybe it was because my computer crashed and something goofed up somewhere. You can close this out, or if someone wants to keep this open to investigate further I will co-operate in investigating what went wrong if needed. Thanks |
Yes, CLN supports the We also support
You can always start without the |
I rephrased the issue now that the funds are back, and we want to address the underlying issue. |
So the issue is that we were trying to insert a channel (in the db) at the same 'id' which is already preoccupied, and all this happens while running the rpc We have a check inside Hence the possible reasons for this could be:
|
@adi2011 I have not done anything with the emergency.recover file. Also, this happened again today. Last time and this time it happened it was when my computer turned off from a system crash or froze up forcing me to do a hard reboot. I have noticed that my computer only has these issues of crashing when I run elementsqt, which is the gui for liquid the bitcoin sidechain, elementsd (non GUI) works fine. Today I found my computer was turned off, probably from a system crash. When I started it up again, I let bitcoin core fully sync to latest block then I started up lightningd and these same issues occured. I also noticed that this also caused another channel to unilaterally close on me. Both last time and this time when my computer crashed it caused a channel to unilaterlaly close and the lightningd to not start up unless I disable the recover plugin. This is the second time in a few days where this has happened and my node having a channel die as a result, and where I can only start with the recover plugin disabled. The last time I was able to start it normally after that killed channel was forgotten. It might also be helpful to know, that these issues started with version 24.02, the first case documented in this issue was on version 24.02, the second incident, the one from today is on version 24.02.01. This is also the first time I've seen channels unilaterally close when my system crashes. In the past before version 24 when my system crashes I am able to start up lightningd normally and with retaining channels. Steps taken:
I plan to see if lightningd starts up again normally later once that chan is forgotten. In summary: So for some reason, a system crash is causing a channel to unilaterally close and when I start up lightningd after a unilateral close the recover plugin freaks out and lightningd crashes.
|
Adding to my last post, I checked the sha256 hash of my emergency.recover file and it is the same as the one in my backup from before issues, so that verifies that the emergency.recover file was not changed since my last backup. |
After the channel was forgotten by lightning, I was able to start it up normally without needing to disable the recover plugin |
Hi @nakoshi-satamoto, I think the issue is because Answering to your questions:
|
BTW is it still throwing |
I was travelling pardon my delayed response. I have not had that issue ever since I did the steps I described above. It starts up normally now and without needing to disable the recover plugin. |
It means, your DB forgot the channel which was having unique ID constraints. Do you have the snapshot of the DB from the point when it was throwing the error? |
This should be resolved now... |
Fixed in #7216 |
New description
The
recover
plugin appears to crash if we had dataloss, but the database still contains a trace of the channel, causing the insert to fail during the recovery.Original issue
Issue and Steps to Reproduce
My system to froze up and was non-responsive. I had to do a forced reboot of the computer. lightningd won't even start up now. I'm on version 24.02, I even tried reverting back to the old version 23.11.2 and still no luck with starting up this node. I even re-extracted clightning-v24.02-Ubuntu-22.04.tar.xz and tried running v24.02 again and yet no luck. I cannot run the newer version 24.0.2.1 because that is not signed with any valid known signature. So I am on 24.0.2 as the latest version. Yes I was mirroring the sqlite database but that is pointless if the mirror gets corrupted also, if this is a case of database corruption.
Below is the output I get when running in debug mode.
While trying to keep starting up core lightning I noticed one of my channels with alot of funds got forced closed. The chan that got forced closed was 831793x2527x0 (boltz) with closure transaction ID being e6df69daa0eefd910d871601bbf0e78e838b6396c79601a28dbbf31bb8312591
I am scared that these funds have been lost forever, if this was a penalty closure, or if this was a force closure and my lightning node lost the information needed to claim the funds after the wait time.
At this point, I'm done with lightning. Too many issues over my time with running core lightning. I just want to recover my funds. I'm okay if I have to force close all my chans if it means I get my funds back.
1. How can I get core lightning working again, or recover my funds?
2. Assuming the database is corrupted and that chan was indeed forced closed, am I able to claim the funds from the forced close script after the wait time?
Output from lightningd in debug mode
The text was updated successfully, but these errors were encountered: