Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor channel state in lightningd #6628

Merged

Conversation

rustyrussell
Copy link
Contributor

It started with our handling of the funding depth callback, which needed cleaning up badly. But it lead me quickly to our channel state functions, which are unclear, especially as we added dual funding states and a splicing state. This makes the code cleaner and easier to update in future.

I stopped short of converting "uncommitted_channel" to a channel in a new state, but that is the next obvious step (this entire series was a side-quest as I was trying to clean up our channel_update handling!).

@rustyrussell rustyrussell added this to the v23.11 milestone Aug 27, 2023
@rustyrussell rustyrussell force-pushed the refactor-channel-state branch from 34044ef to f9ece98 Compare August 28, 2023 02:16
@rustyrussell rustyrussell requested a review from cdecker as a code owner August 28, 2023 02:16
@rustyrussell rustyrussell force-pushed the refactor-channel-state branch 2 times, most recently from ccb877d to 659ed0f Compare September 20, 2023 04:38
@vincenzopalazzo vincenzopalazzo self-assigned this Sep 21, 2023
vincenzopalazzo added a commit to rustyrussell/lnprototest that referenced this pull request Sep 21, 2023
Previously, an unexpected behavior in the ln helper function had gone
unnoticed. However, while Rusty Russell was working on [1], he
discovered a bug in lnprototest.

Specifically, within the helper function that opens a channel, there
was an incorrect way of counting the current block height. Ideally,
this logic should be abstracted elsewhere.

This commit addresses the block height counting
error and resolves the bug identified in [1].

[1] ElementsProject/lightning#6628

Link: https://discord.com/channels/899980449231814676/941465665540325397/1154259454716563537
Reported-by: Rusty Russell <[email protected]>
Signed-off-by: Vincenzo Palazzo <[email protected]>
vincenzopalazzo added a commit to rustyrussell/lnprototest that referenced this pull request Sep 21, 2023
Previously, an unexpected behavior in the ln helper function had gone
unnoticed. However, while Rusty Russell was working on [1], he
discovered a bug in lnprototest.

Specifically, within the helper function that opens a channel, there
was an incorrect way of counting the current block height. Ideally,
this logic should be abstracted elsewhere.

This commit addresses the block height counting
error and resolves the bug identified in [1].

[1] ElementsProject/lightning#6628

Link: https://discord.com/channels/899980449231814676/941465665540325397/1154259454716563537
Reported-by: Rusty Russell <[email protected]>
Signed-off-by: Vincenzo Palazzo <[email protected]>
vincenzopalazzo added a commit to rustyrussell/lnprototest that referenced this pull request Sep 21, 2023
Previously, an unexpected behavior in the ln helper function had gone
unnoticed. However, while Rusty Russell was working on [1], he
discovered a bug in lnprototest.

Specifically, within the helper function that opens a channel, there
was an incorrect way of counting the current block height. Ideally,
this logic should be abstracted elsewhere.

This commit addresses the block height counting
error and resolves the bug identified in [1].

[1] ElementsProject/lightning#6628

Link: https://discord.com/channels/899980449231814676/941465665540325397/1154259454716563537
Reported-by: Rusty Russell <[email protected]>
Signed-off-by: Vincenzo Palazzo <[email protected]>
vincenzopalazzo added a commit to rustyrussell/lnprototest that referenced this pull request Sep 21, 2023
Previously, an unexpected behavior in the ln helper function had gone
unnoticed. However, while Rusty Russell was working on [1], he
discovered a bug in lnprototest.

Specifically, within the helper function that opens a channel, there
was an incorrect way of counting the current block height. Ideally,
this logic should be abstracted elsewhere.

This commit addresses the block height counting
error and resolves the bug identified in [1].

[1] ElementsProject/lightning#6628

Link: https://discord.com/channels/899980449231814676/941465665540325397/1154259454716563537
Reported-by: Rusty Russell <[email protected]>
Signed-off-by: Vincenzo Palazzo <[email protected]>
Copy link
Collaborator

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments while I was review the PR to understand what was changed from before

In order to fix lnprototest you should build on top of #6702

The problem was that lnprototest was counting the current block heigh in the wrong way and this mean that core lightning was going out of sync (causing the reorg)

@ddustin
Copy link
Collaborator

ddustin commented Sep 21, 2023

Oh sweet! A clean up of funding depth would be awesome!

@ddustin
Copy link
Collaborator

ddustin commented Sep 21, 2023

Reviewed the parts related to splicing, ACK 659ed0f

While looking at this I've realized in splice confirmation, channel_control is setting channel->scid too early. A peer disconnection and restart of channeld here will cause channeld to boot up with the new scid before mutual_splice_lock, causing scid disagreement. Which is probably one of the contributors to the aggressive splice restart test issue.

That's an existing problem not caused by this refactor -- though fixing it will cause merge conflicts 😅

Edit: I think this was easier to notice because of the cleaner code post refactor -- woo clean code!
Put in issue for reference: #6703

@rustyrussell rustyrussell force-pushed the refactor-channel-state branch 3 times, most recently from 661ffbd to e4032d9 Compare September 28, 2023 00:55
@rustyrussell
Copy link
Contributor Author

rustyrussell commented Sep 28, 2023

OK, this needed some rework: there was more to clean up!

In particular, we used funding_depth_cb for both the main funding tx, as well as inflights (DF RBF attempts, splicing). This was deeply confusing, and caused weird issues until I understood it. I've now separated those out.

@rustyrussell rustyrussell force-pushed the refactor-channel-state branch 3 times, most recently from db77fc9 to c8d0e93 Compare October 1, 2023 02:58
@rustyrussell rustyrussell force-pushed the refactor-channel-state branch 3 times, most recently from fd1d019 to e49a2dc Compare October 1, 2023 11:02
This is actually a real issue (l1 doesn't see the warning before l2
drops the connection), but it's unrelated to this PR, and will require
another one to fix.

Signed-off-by: Rusty Russell <[email protected]>
Check the exact scids.  Makes it simpler when failures occur.

Signed-off-by: Rusty Russell <[email protected]>
Not just if htlc addition is too slow, make this the default.  dual-open's txabort
is excluded, however.

Signed-off-by: Rusty Russell <[email protected]>
It's a mess right now.

Try to express it as a switch() statement over the states we can be in.

Signed-off-by: Rusty Russell <[email protected]>
This is a workaround, the real fix is to use a different
callback for inflight splice attempts, which comes later.

Signed-off-by: Rusty Russell <[email protected]>
Rename slightly, remove first arg, and make it a noop of
there's no owner on channel.

Signed-off-by: Rusty Russell <[email protected]>
Currently it's half done in funding_depth_cb, and half in
channeld_tell_depth.  It's very confusing as a result,
with splicing, dual-funding and zeroconf.

This does introduce a behaviour change: if a channel is NORMAL and
it gets reorganized, we force close (unless we were the one who funded
it, or it's zeroconf anyway).  This is safer than continuing to use
the channel in this case!

Some tests are changed to zeroconf to make them work, but v2 doesn't
support zeroconf, so that's removed.

Signed-off-by: Rusty Russell <[email protected]>
Take an optional filter function, so callers can say exactly what they want. 

Signed-off-by: Rusty Russell <[email protected]>
We should use capability tests for states (can you add htlcs?) rather than vague
descriptions (are you closing?).

And as much as possible, use switch () statements to force us to think
about all the cases, especially when we add new states!

Signed-off-by: Rusty Russell <[email protected]>
We usually hand times by copy, not by pointer (and if we did, they should
be const!).  I noticed this particularly for the state changed code, but
it goes down to to json_add_timeiso, so I fixed that too.

Signed-off-by: Rusty Russell <[email protected]>
This is the variant of DUALOPEND_OPEN_INIT which you see once
the channel is in the db: we'll be adding it next, but to reduce
clutter the docs are added as a separate commit.

Signed-off-by: Rusty Russell <[email protected]>
…ALOPEND_OPEN_COMMITTED.

The latter is used when we're put in the db, the former is the uncommitted state.
Currently dbid == 0 is used in addition to the state, which is unwieldy.

Signed-off-by: Rusty Russell <[email protected]>
Changelog-Experimental: JSON-RPC: added new dual-funding state `DUALOPEND_OPEN_COMMITTED`
It has the information we need, now.

Signed-off-by: Rusty Russell <[email protected]>
It was a wrapper only used in one place anyway.

Signed-off-by: Rusty Russell <[email protected]>
Don't assume the arg is a channel.

Signed-off-by: Rusty Russell <[email protected]>
We never do this, but we're about to (we always watch before
we broadcast a tx).

We use a `depth` member to avoid calling the callback multiple times
for the same event, but we initialize it to 0.  This means if we
register a watch, and the first thing that happens is that it
reorganizes out, we *don't* make the callback.

Use an impossible value at initialization, instead.

Signed-off-by: Rusty Russell <[email protected]>
We use the *same* callback for the funding tx, as well as for inflight dual-funding txs, as well as inflight splice txs.  This is deeply confusing!

Instead, use explicit cbs for splicing and df.  Once they're locked in, use the normal callback.

Signed-off-by: Rusty Russell <[email protected]>
…ce scid.

We used the original channel funding output number.  I'm not sure if this
was true in the previous code, or a regression I introduced, but it
caused occasonal failures in test_splice_gossip!

Signed-off-by: Rusty Russell <[email protected]>
Now we're not always using the same functions to watch during
dual-funding opening, we need to make sure we're watching the close
(in particular, df close before the opening is confirmed).

So, keep a pointer, and if it's not set in drop_to_chain, set it.

Signed-off-by: Rusty Russell <[email protected]>
…g tx.

We make dualopend_tell_depth static, which means we move it
higher in the file.

Signed-off-by: Rusty Russell <[email protected]>
@rustyrussell rustyrussell force-pushed the refactor-channel-state branch from e49a2dc to d333f19 Compare October 1, 2023 23:00
@rustyrussell rustyrussell merged commit 72f914a into ElementsProject:master Oct 2, 2023
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants