Refactor channel state in lightningd #6628

rustyrussell · 2023-08-27T04:08:24Z

It started with our handling of the funding depth callback, which needed cleaning up badly. But it lead me quickly to our channel state functions, which are unclear, especially as we added dual funding states and a splicing state. This makes the code cleaner and easier to update in future.

I stopped short of converting "uncommitted_channel" to a channel in a new state, but that is the next obvious step (this entire series was a side-quest as I was trying to clean up our channel_update handling!).

lightningd/peer_control.c

Previously, an unexpected behavior in the ln helper function had gone unnoticed. However, while Rusty Russell was working on [1], he discovered a bug in lnprototest. Specifically, within the helper function that opens a channel, there was an incorrect way of counting the current block height. Ideally, this logic should be abstracted elsewhere. This commit addresses the block height counting error and resolves the bug identified in [1]. [1] ElementsProject/lightning#6628 Link: https://discord.com/channels/899980449231814676/941465665540325397/1154259454716563537 Reported-by: Rusty Russell <[email protected]> Signed-off-by: Vincenzo Palazzo <[email protected]>

vincenzopalazzo

Left some comments while I was review the PR to understand what was changed from before

In order to fix lnprototest you should build on top of #6702

The problem was that lnprototest was counting the current block heigh in the wrong way and this mean that core lightning was going out of sync (causing the reorg)

ddustin · 2023-09-21T16:35:17Z

Oh sweet! A clean up of funding depth would be awesome!

ddustin · 2023-09-21T17:00:52Z

Reviewed the parts related to splicing, ACK 659ed0f

While looking at this I've realized in splice confirmation, channel_control is setting channel->scid too early. A peer disconnection and restart of channeld here will cause channeld to boot up with the new scid before mutual_splice_lock, causing scid disagreement. Which is probably one of the contributors to the aggressive splice restart test issue.

That's an existing problem not caused by this refactor -- though fixing it will cause merge conflicts 😅

Edit: I think this was easier to notice because of the cleaner code post refactor -- woo clean code!
Put in issue for reference: #6703

rustyrussell · 2023-09-28T00:57:15Z

OK, this needed some rework: there was more to clean up!

In particular, we used funding_depth_cb for both the main funding tx, as well as inflights (DF RBF attempts, splicing). This was deeply confusing, and caused weird issues until I understood it. I've now separated those out.

Signed-off-by: Rusty Russell <[email protected]>

This is actually a real issue (l1 doesn't see the warning before l2 drops the connection), but it's unrelated to this PR, and will require another one to fix. Signed-off-by: Rusty Russell <[email protected]>

… state. Signed-off-by: Rusty Russell <[email protected]>

Check the exact scids. Makes it simpler when failures occur. Signed-off-by: Rusty Russell <[email protected]>

Not just if htlc addition is too slow, make this the default. dual-open's txabort is excluded, however. Signed-off-by: Rusty Russell <[email protected]>

It's a mess right now. Try to express it as a switch() statement over the states we can be in. Signed-off-by: Rusty Russell <[email protected]>

This is a workaround, the real fix is to use a different callback for inflight splice attempts, which comes later. Signed-off-by: Rusty Russell <[email protected]>

Rename slightly, remove first arg, and make it a noop of there's no owner on channel. Signed-off-by: Rusty Russell <[email protected]>

Currently it's half done in funding_depth_cb, and half in channeld_tell_depth. It's very confusing as a result, with splicing, dual-funding and zeroconf. This does introduce a behaviour change: if a channel is NORMAL and it gets reorganized, we force close (unless we were the one who funded it, or it's zeroconf anyway). This is safer than continuing to use the channel in this case! Some tests are changed to zeroconf to make them work, but v2 doesn't support zeroconf, so that's removed. Signed-off-by: Rusty Russell <[email protected]>

Take an optional filter function, so callers can say exactly what they want. Signed-off-by: Rusty Russell <[email protected]>

We should use capability tests for states (can you add htlcs?) rather than vague descriptions (are you closing?). And as much as possible, use switch () statements to force us to think about all the cases, especially when we add new states! Signed-off-by: Rusty Russell <[email protected]>

Signed-off-by: Rusty Russell <[email protected]>

…nnel_state_uncommitted. Signed-off-by: Rusty Russell <[email protected]>

We usually hand times by copy, not by pointer (and if we did, they should be const!). I noticed this particularly for the state changed code, but it goes down to to json_add_timeiso, so I fixed that too. Signed-off-by: Rusty Russell <[email protected]>

This is the variant of DUALOPEND_OPEN_INIT which you see once the channel is in the db: we'll be adding it next, but to reduce clutter the docs are added as a separate commit. Signed-off-by: Rusty Russell <[email protected]>

…ALOPEND_OPEN_COMMITTED. The latter is used when we're put in the db, the former is the uncommitted state. Currently dbid == 0 is used in addition to the state, which is unwieldy. Signed-off-by: Rusty Russell <[email protected]> Changelog-Experimental: JSON-RPC: added new dual-funding state `DUALOPEND_OPEN_COMMITTED`

It has the information we need, now. Signed-off-by: Rusty Russell <[email protected]>

It was a wrapper only used in one place anyway. Signed-off-by: Rusty Russell <[email protected]>

Don't assume the arg is a channel. Signed-off-by: Rusty Russell <[email protected]>

We never do this, but we're about to (we always watch before we broadcast a tx). We use a `depth` member to avoid calling the callback multiple times for the same event, but we initialize it to 0. This means if we register a watch, and the first thing that happens is that it reorganizes out, we *don't* make the callback. Use an impossible value at initialization, instead. Signed-off-by: Rusty Russell <[email protected]>

We use the *same* callback for the funding tx, as well as for inflight dual-funding txs, as well as inflight splice txs. This is deeply confusing! Instead, use explicit cbs for splicing and df. Once they're locked in, use the normal callback. Signed-off-by: Rusty Russell <[email protected]>

…ce scid. We used the original channel funding output number. I'm not sure if this was true in the previous code, or a regression I introduced, but it caused occasonal failures in test_splice_gossip! Signed-off-by: Rusty Russell <[email protected]>

Now we're not always using the same functions to watch during dual-funding opening, we need to make sure we're watching the close (in particular, df close before the opening is confirmed). So, keep a pointer, and if it's not set in drop_to_chain, set it. Signed-off-by: Rusty Russell <[email protected]>

…irms. Signed-off-by: Rusty Russell <[email protected]>

…g tx. We make dualopend_tell_depth static, which means we move it higher in the file. Signed-off-by: Rusty Russell <[email protected]>

rustyrussell added this to the v23.11 milestone Aug 27, 2023

rustyrussell force-pushed the refactor-channel-state branch from 34044ef to f9ece98 Compare August 28, 2023 02:16

rustyrussell requested a review from cdecker as a code owner August 28, 2023 02:16

rustyrussell force-pushed the refactor-channel-state branch 2 times, most recently from ccb877d to 659ed0f Compare September 20, 2023 04:38

vincenzopalazzo self-assigned this Sep 21, 2023

vincenzopalazzo reviewed Sep 21, 2023

View reviewed changes

lightningd/peer_control.c Outdated Show resolved Hide resolved

vincenzopalazzo reviewed Sep 21, 2023

View reviewed changes

lightningd/peer_control.c Outdated Show resolved Hide resolved

vincenzopalazzo mentioned this pull request Sep 21, 2023

fix: Correct reorg issue in the ln helper function rustyrussell/lnprototest#107

Merged

vincenzopalazzo reviewed Sep 21, 2023

View reviewed changes

rustyrussell force-pushed the refactor-channel-state branch 3 times, most recently from 661ffbd to e4032d9 Compare September 28, 2023 00:55

rustyrussell force-pushed the refactor-channel-state branch 3 times, most recently from db77fc9 to c8d0e93 Compare October 1, 2023 02:58

rustyrussell added 2 commits October 1, 2023 13:49

poetry: run poetry update.

e32118a

Signed-off-by: Rusty Russell <[email protected]>

contrib/pyln-grpc-proto/ regenerate.

35aa3bc

Signed-off-by: Rusty Russell <[email protected]>

rustyrussell force-pushed the refactor-channel-state branch 3 times, most recently from fd1d019 to e49a2dc Compare October 1, 2023 11:02

rustyrussell added 2 commits October 2, 2023 09:29

pytest: fix flake in upfront warning.

0f29c80

This is actually a real issue (l1 doesn't see the warning before l2 drops the connection), but it's unrelated to this PR, and will require another one to fix. Signed-off-by: Rusty Russell <[email protected]>

patch remove-developer-test-annotations.patch

e82fa35

rustyrussell added 24 commits October 2, 2023 09:29

doc: fix listpeerchannels schema to allow CHANNELD_AWAITING_SPLICE in…

0050dbc

… state. Signed-off-by: Rusty Russell <[email protected]>

pytest: make test_splice_gossip more precise.

4bef9df

Check the exact scids. Makes it simpler when failures occur. Signed-off-by: Rusty Russell <[email protected]>

lightningd: disconnect on *any* transient error, except abort

bb8c49f

Not just if htlc addition is too slow, make this the default. dual-open's txabort is excluded, however. Signed-off-by: Rusty Russell <[email protected]>

lightningd: clean up channel_tell_depth.

4162cd9

It's a mess right now. Try to express it as a switch() statement over the states we can be in. Signed-off-by: Rusty Russell <[email protected]>

lightningd: don't report original depth once splice started.

00fabc5

This is a workaround, the real fix is to use a different callback for inflight splice attempts, which comes later. Signed-off-by: Rusty Russell <[email protected]>

lightningd: make dualopen_tell_depth match channeld_tell_depth.

441e61c

Rename slightly, remove first arg, and make it a noop of there's no owner on channel. Signed-off-by: Rusty Russell <[email protected]>

lightningd: generalize peer_any_active_channel to peer_any_channel.

2b2caa5

Take an optional filter function, so callers can say exactly what they want. Signed-off-by: Rusty Russell <[email protected]>

lightningd: remove peer_any_unsaved_channel and use peer_any_channel.

1c9722d

Signed-off-by: Rusty Russell <[email protected]>

wallet: add standard sanity-check function for channel_state.

d9c54df

Signed-off-by: Rusty Russell <[email protected]>

lightningd/channel.h: rename channel_unsaved to the more explicit cha…

947a516

…nnel_state_uncommitted. Signed-off-by: Rusty Russell <[email protected]>

doc: introduce new state DUALOPEND_OPEN_COMMITTED.

eef8c96

This is the variant of DUALOPEND_OPEN_INIT which you see once the channel is in the db: we'll be adding it next, but to reduce clutter the docs are added as a separate commit. Signed-off-by: Rusty Russell <[email protected]>

lightningd: make channel-query functions all take state.

e326f85

It has the information we need, now. Signed-off-by: Rusty Russell <[email protected]>

lightningd: remove watch_tx() in favor of watch_txid().

055d600

It was a wrapper only used in one place anyway. Signed-off-by: Rusty Russell <[email protected]>

lightningd: make watch_txid more generic.

add836e

Don't assume the arg is a channel. Signed-off-by: Rusty Russell <[email protected]>

lightningd: fix dual-funding case where we coop close and an RBF conf…

8fea8cd

…irms. Signed-off-by: Rusty Russell <[email protected]>

lightningd: simplify funding_depth_cb now it only handles main fundin…

d333f19

…g tx. We make dualopend_tell_depth static, which means we move it higher in the file. Signed-off-by: Rusty Russell <[email protected]>

rustyrussell force-pushed the refactor-channel-state branch from e49a2dc to d333f19 Compare October 1, 2023 23:00

rustyrussell merged commit 72f914a into ElementsProject:master Oct 2, 2023
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor channel state in lightningd #6628

Refactor channel state in lightningd #6628

rustyrussell commented Aug 27, 2023

vincenzopalazzo left a comment

ddustin commented Sep 21, 2023

ddustin commented Sep 21, 2023 •

edited

Loading

rustyrussell commented Sep 28, 2023 •

edited

Loading

Refactor channel state in lightningd #6628

Refactor channel state in lightningd #6628

Conversation

rustyrussell commented Aug 27, 2023

vincenzopalazzo left a comment

Choose a reason for hiding this comment

ddustin commented Sep 21, 2023

ddustin commented Sep 21, 2023 • edited Loading

rustyrussell commented Sep 28, 2023 • edited Loading

ddustin commented Sep 21, 2023 •

edited

Loading

rustyrussell commented Sep 28, 2023 •

edited

Loading