Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pruning problem #5310

Closed
kroese opened this issue Jun 7, 2022 · 29 comments
Closed

Pruning problem #5310

kroese opened this issue Jun 7, 2022 · 29 comments
Assignees

Comments

@kroese
Copy link

kroese commented Jun 7, 2022

I did a fresh installation of bitcoind with pruning set to 100gb. I waited untill it was completely synced. Then I installed CLN and connected to the bitcoind node.

The problem is that for the last hour it keeps requesting the same block from 2018 every second in a loop:

UNUSUAL plugin-bcli: /usr/bin/bitcoin-cli -datadir=/data/.bitcoin -rpcconnect=172.17.0.2 -rpcport=8332 -rpcuser=... -rpcpassword=... getblock 00000000000000000005f7a06bd4efe545999aba00eeff9a49747a3cd1f3c9df 0 exited with status 1

I don't understand why it keeps on trying this same block, since it should realize it's not available after trying only once. I think it heard of this block through channel gossip, since I don't have any channels yet myself.

Besides the problem with CLN getting stuck on this block, I think I will have another problem.

Namely that my graph will miss all channels created more than a year ago? I thought a pruned node would be fully functional, but if I miss all the old channels it is a big downside.

So is my mistake that I should have already started CLN while Bitcoin was still syncing the chain? That way CLN would have had access to the blocks from 2018 that are now pruned. Or is there no solution?

getinfo output

{
   "id": "xxxx",
   "alias": "xxxx",
   "color": "xxxxxx",
   "num_peers": 1,
   "num_pending_channels": 0,
   "num_active_channels": 0,
   "num_inactive_channels": 0,
   "address": [
      {
         "type": "ipv4",
         "address": "xx.xx.xx.xx",
         "port": 9760
      }
   ],
   "binding": [
      {
         "type": "ipv4",
         "address": "0.0.0.0",
         "port": 9735
      }
   ],
   "version": "v0.10.2",
   "blockheight": 739674,
   "network": "bitcoin",
   "msatoshi_fees_collected": 0,
   "fees_collected_msat": "0msat",
   "lightning-dir": "/data/.lightning/bitcoin"
}
@vincenzopalazzo
Copy link
Collaborator

Can you check if your bitcoin instance has the block 00000000000000000005f7a06bd4efe545999aba00eeff9a49747a3cd1f3c9df? for pruning we have better alternative like https://github.com/clightning4j/btcli4j or other backend listed in https://github.com/lightningd/plugins

In addition, I think that the two problems that you have are related, in particular I think that in the last year the blockchain grows more than 100 GB

@kroese
Copy link
Author

kroese commented Jun 7, 2022

@vincenzopalazzo No, this block is from 2018 and I have only blocks from a year ago.

I am just trying to understand:

  • Why does it need this block? I have zero channels so it is not needed for my channels.. Does it need to verify the opening transaction for EVERY channel in the graph?

  • Why does it keep requesting it for hours? It should just fail once, and ignore the channel I suppose?

I would rather not switch to another backend. I thought pruning was fully supported as long as you make sure C Lightning does not get behind too far.

@kroese
Copy link
Author

kroese commented Jun 7, 2022

I did some more research and it seems indeed that the mistake was to wait until the IBD was completed, before starting C-Lightning. I should have let them run together while syncing. But this introduces other problems as C-Lightning syncs slower than Bitcoind and can get behind too far.

The best solution would if C-Lightning just implemented the getblockfrompeer RPC call that was recently added to Bitcoin.

So now my only option is to connect C-Lightning to an external full node (without pruning) to let it validate all the channels in the graph.

That leads me to the final question:

Is it safe to switch C Lightning from the unpruned node back to the pruned node after it validated all the channels? And how do I know it has finished validating every channel in the graph, so that I have the garantuee it will never need an old block again?

@jb55
Copy link
Collaborator

jb55 commented Jun 7, 2022

The best solution would if C-Lightning just implemented the getblockfrompeer RPC call that was recently added to Bitcoin

this is an interesting idea as a fallback to getblock. cln on pruned nodes has always been a huge pain.

@vincenzopalazzo
Copy link
Collaborator

this is an interesting idea as a fallback to getblock. cln on pruned nodes has always been a huge pain.

Working to translate it in a compiled language (really compiled)

@vincenzopalazzo
Copy link
Collaborator

Is it safe to switch C Lightning from the unpruned node back to the pruned node after it validated all the channels? And how do I know it has finished validating every channel in the graph, so that I have the garantuee it will never need an old block again?

I think if you have old channel you need to verify them, so if you have a channel old 10 years can be a problem, However, I'm not 100% sure about that.

cc @cdecker

@kristapsk
Copy link
Contributor

So is my mistake that I should have already started CLN while Bitcoin was still syncing the chain? That way CLN would have had access to the blocks from 2018 that are now pruned. Or is there no solution?

My approach is to start bitcoind and then CLN while it is still syncing. I noticed that bitcoind prunes faster than CLN processes blocks, use this script also constantly running as a workaround (it has locking, so just add * * * * * /home/cln/cln-prune-protector.sh 10000 >> /home/cln/cln-prune-protector.log 2>&1 to crontab), it will temporary disable bitcoind network activity if CLN is falling too much behind. https://github.com/kristapsk/cln-scripts/blob/master/cln-prune-protector.sh

The best solution would if C-Lightning just implemented the getblockfrompeer RPC call that was recently added to Bitcoin.

Kinda sounds right, but from my experience it will make CLN sync a lot slower, as at for most of the sync time it will ask for every block that way.

@jb55
Copy link
Collaborator

jb55 commented Jun 7, 2022

slow is better than broken. there has been ideas thrown in the past about using keep-blocks but then you run into disk space back pressure which might run out. I see your script is turning the network on and off... seems a bit extreme but it's an interesting approach.

@kroese
Copy link
Author

kroese commented Jun 7, 2022

@kristapsk Yes, I saw your script and really liked it. But since I am running both Bitcoin and C-Lightning in separate docker containers, I would need to heavily modify the script to be able to use it from the host machine.

Also I am not sure if the script will make the process 100% watertight. Because it would require a garantuee that CLN received all channel gossip before reaching the related blocks. But if it receives an additional old channel after that, it will still fail to get block. I don't know if there is a way to be sure that you received all gossips about every channel ever created. And even if there is, there is always the possibility that someone broadcast a new channel with a very old funding transaction.

@kristapsk
Copy link
Contributor

kristapsk commented Jun 7, 2022

I would need to heavily modify the script to be able to use it from the host machine.

Not sure about that. What you need is working both bitcoin-cli and lightning-cli on a CLN container. And CLN itself depends on a working bitcoin-cli, right?

Script was actively doing turning on / off during IBD, afterwards it haven't done turning off (but it would if, for example, CLN service would not be running). I have prune=20000 in bitcoin.conf on that specific VPS where I use it.

@wtogami
Copy link
Contributor

wtogami commented Jun 13, 2022

prune=anynumber is unsafe with CLN for reasons you identified above.

$ bitcoin-cli help pruneblockchain
pruneblockchain height



Arguments:
1. height    (numeric, required) The block height to prune up to. May be set to a discrete height, or to a UNIX epoch time
             to prune blocks whose block time is at least 2 hours older than the provided timestamp.

Result:
n    (numeric) Height of the last block pruned

Examples:
> bitcoin-cli pruneblockchain 1000
> curl --user myusername --data-binary '{"jsonrpc": "1.0", "id": "curltest", "method": "pruneblockchain", "params": [1000]}' -H 'content-type: text/plain;' http://127.0.0.1:8332/

The dependent app (in this case CLN) should instead be driving bitcoind's pruning with this RPC. With CLN in control of pruning you are never at risk of bitcoind pruning too far ahead.

@kroese
Copy link
Author

kroese commented Jun 13, 2022

You are right. But even though I made the mistake of letting Bitcoin sync first, it is still a bug that CLN tried to request the same block for hours in a loop.

It would made have much more sense to skip the blocks and ignore the related channels, instead of going into a deathloop.

@ghost
Copy link

ghost commented Jun 13, 2022

I've been running on pruned mode successfully, but periodically it hits this bug. It's weird and appears possibly because of malicious gossip because it is always referencing a block from years ago, when lightning channels were only a glimmer in a nerds eye.

I've found a work around because on it's own it seems to get stuck in a loop requesting a block that doesn't exist and all the other node activity slows down. There are a couple of plugins that are meant to make running on a pruned node more reliable*. Although I have never been able to get btcli4j actually configured properly sync (it seems unable to fetch blocks) - just starting up clightning with that plugin clears the queue on fetching that block and then allows me to start up normally again.

@wtogami
Copy link
Contributor

wtogami commented Jun 14, 2022

Sounds like a redundant fallback lookup for old blocks would be a perfect plugin.

@kristapsk
Copy link
Contributor

https://github.com/clightning4j/btcli4j/tree/ecacb049d41e2282c5595e84a6f9db6a601c3bc3

I get "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."

@ghost
Copy link

ghost commented Jun 15, 2022

https://github.com/clightning4j/btcli4j/tree/ecacb049d41e2282c5595e84a6f9db6a601c3bc3

I get "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."

It is outside this repository - it is from the list of community plugins : https://github.com/lightningd/plugins

@vincenzopalazzo
Copy link
Collaborator

vincenzopalazzo commented Jun 15, 2022

@kristapsk @AutonomousOrganization Just use the master branch https://github.com/clightning4j/btcli4j

There are a couple of plugins that are meant to make running on a pruned node more reliable*. Although I have never been able to get btcli4j actually configured properly sync (it seems unable to fetch blocks)

I put all my effort to keep alive and maintain my tool, but I can not dream of the bug that people have if you open an issue I can help you to configure it.

Disclaimer, there isn't really configuration :) just a flag to run in pruning mode

@bubelov
Copy link
Contributor

bubelov commented Nov 22, 2022

Just noticed the same issue on my new pruned node. Is there a reason why the official docs on pruning doesn't mention this issue? It looks like a common situation with non-negligible negative consequences.

Is it considered a bug or wontfix? Is it safe to ignore it, assuming bitcoind and lightningd agree on a current block height and it's up-to-date?

@djmuhlestein
Copy link

Just to add to the last comment... most comments in this thread suggest you should start lightningd when you start bitcoind. Is starting a lightning node never allowed for someone that has already got a bitcoin client up and running? I tried starting both daemons at the same time but ran into the issue anyway for reasons I think have already been discussed. Additionally though, for running a pruned node I found I could just download a prune snapshot and start from that rather than waiting to sync the entire blockchain. For both cases, it seems lightningd needs to work around this issue.

@kroese
Copy link
Author

kroese commented May 13, 2023

One year has passed, and this issue is still not fixed :(

Every few weeks I still run into this endless loop of getblock calls, and restarting does not fix it. I have to point clightning to a non-pruned node to fetch that block, and revert back to using the pruning node immediately afterwards.

I think what happens is that sometimes it hears about a very old block through gossip, which triggers the endless loop.

It would be so easy to fix this: just ignore blocks that fail to fetch after X tries. Or otherwise add an option where you can specify the maximum block age and don't even try to fetch them. Or the best solution: use the getblockfrompeer rpc call to automaticly fetch the missing block from a peer, when getblock fails.

@vincenzopalazzo @rustyrussell @cdecker Can one of you please look into this, because either of these three solutions are simple to implement, and would solve this issue.

@vincenzopalazzo
Copy link
Collaborator

Or the best solution: use the getblockfrompeer rpc call to automaticly fetch the missing block from a peer, when getblock fails.

I will look into this, thanks

Can one of you please look into this, because either of these three solutions are simple to implement, and would solve this issue.

I will!

@vincenzopalazzo vincenzopalazzo self-assigned this May 13, 2023
@vincenzopalazzo vincenzopalazzo added this to the v23.08 milestone May 13, 2023
@kroese
Copy link
Author

kroese commented May 13, 2023

The problem with this RPC call is that you have to specify the index of the peer (for example the first peer) and you cannot say that you want it from ANY peer (in case the first peer is also pruned just like yourself). So either you have to implement logic to try all peers, or gamble that your first peer is not pruned.

But even if it just tries the first peer, I would be happy already, because in 90 percent of the cases it will work fine.

EDIT: I added a feature request ( bitcoin/bitcoin#27652 ) to make this possible, but until that is implemented just trying the first peer would be fine.

@vincenzopalazzo
Copy link
Collaborator

vincenzopalazzo commented May 13, 2023

My intention is to add something experimental to the plugin https://github.com/coffee-tools/folgore. Once we reach a consensus, we can try to integrate it with CLN. The plugin is a good place to experiment.

The original idea is to completely bypass Bitcoin Core if the block is out of range, and fetch the block directly from the network.

@kroese
Copy link
Author

kroese commented May 14, 2023

There are already multiple plugins that can workaround this issue (like your own btccli4j for example), but I use CLN via a prebuild Docker container, so I cannot install any plugins.

So my hope was that the issue could be fixed in CLN itself, not by using a different backend through a plugin.

Because it is a basicly a possible DoS attack: someone can send a channel funding message to me on purpose, which is referring to a very old block, and bring my node in an endless loop. Switching to a different backend is more like avoiding the problem instead of fixing it.

@kroese
Copy link
Author

kroese commented Jun 3, 2023

Ran into this issue again today, really getting tired of it..

I really don't understand why this won't get fixed:

  • It makes running a pruned node impossible
  • It's a serious issue that brings the node in a deathloop when it receives malicious gossip
  • It would be simple to fix, and I already proposed three different ways that would only take a couple of lines of code

And I really appreciate that @vincenzopalazzo is willing to look into this, but bypassing Bitcoin Core via a plugin seems to be complete overkill.

@bubelov
Copy link
Contributor

bubelov commented Jun 4, 2023

Yep, pruned mode shouldn't be advertised if it isn't working, it gives the users wrong expectations and breaks their nodes

@benjaminchodroff
Copy link

benjaminchodroff commented Jul 16, 2023

I worked around this issue with CLN using a pruned bitcoind issue in a docker environment by using btc-rpc-proxy which is available as a docker blockstream/btc-rpc-proxy:latest.

I haved exposed the btc-rpc-proxy docker port 8331, and mount a config directory in /data with the following config.toml in it:

bitcoind_user = "hello"
bitcoind_password = "world"
bind_address = "0.0.0.0"
bind_port = 8331
bitcoind_address = "192.168.1.160"
bitcoind_port = 8332

[user.clnuser]
password = "clnpassword"
allowed_calls = [
"createrawtransaction",
"decoderawtransaction",
"decodescript",
"echo",
"estimatefee",
"estimatepriority",
"estimatesmartfee",
"estimatesmartpriority",
"getbestblockhash",
"getblock",
"getblockchaininfo",
"getblockcount",
"getblockhash",
"getblockheader",
"getchaintips",
"getdifficulty",
"getinfo",
"getmempoolinfo",
"getnetworkinfo",
"getrawmempool",
"getrawtransaction",
"gettxout",
"gettxoutproof",
"gettxoutsetinfo",
"sendrawtransaction",
"verifytxoutproof"
]

You can then update the CLN to point to this proxy on 8331 with the clnuser and clnpassword, and it should work with your pruned node while pulling blocks p2p when required.

@rustyrussell rustyrussell modified the milestones: v23.08, v23.11 Jul 31, 2023
@nepet nepet removed this from the v23.11 milestone Dec 3, 2023
@bubelov
Copy link
Contributor

bubelov commented May 7, 2024

Is bcli plugin active by default?

If so, shall we close this issue due to #7240 being merged?

@vincenzopalazzo
Copy link
Collaborator

Correct @bubelov

Fixes by #7240

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants