Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NBX stops syncing: "Node disconnected for reason: Not sending headers first anymore " #461

Closed
kallerosenbaum opened this issue Mar 5, 2024 · 8 comments

Comments

@kallerosenbaum
Copy link

When I run some tests on regtest with bitcoin core 26.0 and nbx 2.5.0, NBX often suddenly stops syncing with the node. This did not happen on nbx 2.4.1.

info: Configuration:  Data Directory: /datadir                                                                                                                                                                                                            [9/1779]
info: Configuration:  Configuration File: /datadir/RegTest/settings.config                                                                                                                                                                                        
info: Configuration:  Network: Regtest                                                                                                                                                                                                                            
info: Configuration:  Supported chains: BTC                                                                                                                                                                                                                       
info: Configuration:  DBCache: 50 MB                                                                                                                                                                                                                              
info: Configuration:  Network: Regtest                                                                                                                                                                                                                            
info: Configuration:  Supported chains: BTC                                                                                                                                                                                                                       
info: Configuration:  DBCache: 50 MB                                                                                                                                                                                                                              
info: NBXplorer.DatabaseSetup: Postgres services activated                                                                                                                                                                                                        
info: NBXplorer.Indexer.BTC: TCP Connection succeed, handshaking...                                                                                                                                                                                               
info: NBXplorer.Indexer.BTC: Handshaked                         
info: NBXplorer.Indexer.BTC: Testing RPC connection to http://bitcoin:18443/                                                     
Hosting environment: Production                                 
Content root path: /app/                                        
Now listening on: http://0.0.0.0:32832                          
Application started. Press Ctrl+C to shut down.                 
info: NBXplorer.Indexer.BTC: RPC connection successful          
info: NBXplorer.Indexer.BTC: Full node version detected: 260000                                                                  
info: NBXplorer.Indexer.BTC: Has txindex support                
info: NBXplorer.Indexer.BTC: NBXplorer is correctly whitelisted by the node                                                      
info: NBXplorer.Events: BTC: Node state changed: NotStarted => NBXplorerSynching                                                 
info: NBXplorer.Events: BTC: Node state changed: NBXplorerSynching => Ready                                                      
info: NBXplorer.Events: BTC: New block 56ee874f44822bc9d508d68231d12b13d51bb3b86a3f946ea433531e15cbb465 (112)                    
info: NBXplorer.Events: BTC: tpubDDTKfr...xYy8DdpR1SJYe69GAhKC matching confirmed transaction 54a235...24679d (0/0)              
info: NBXplorer.Events: BTC: New block 7388428b4c847bd2d1de38c1f1af074c47530a2e5465223f1d7788d41c0eb6b3 (113)                    
info: NBXplorer.Events: BTC: New block 3a081e622d5bf8119023cfd70d23305607998c48e5f21cb8a6d2cb7c3a61861b (114)                    
info: NBXplorer.Events: BTC: New block 0d5f0e7d289712ea97cab0261d31659a5947b42a6cf77e4752dcb90aeab7ba74 (115)                    
info: NBXplorer.Events: BTC: New block 0c6d9280ae1c4e46aee371dac84132ec1a4e743ccf645f759ff84d00eb9d21a5 (116)                    
info: NBXplorer.Events: BTC: tpubDDZr51...Uys3NCyaaCL7wwPrpfE7 matching unconfirmed transaction 2cf861...1a61af (0/0)            
info: NBXplorer.Events: BTC: New block 0d8b0007742fe46167a0f4cb703c42aeb194ebc5e051ef1112b18720b0fab982 (117)                    
info: NBXplorer.Events: BTC: tpubDDZr51...Uys3NCyaaCL7wwPrpfE7 matching confirmed transaction 2cf861...1a61af (0/0)              
warn: NBXplorer.Indexer.BTC: Node disconnected for reason: Not sending headers first anymore                                     
info: NBXplorer.Events: BTC: Node state changed: Ready => NotStarted                                                             
fail: NBXplorer.Indexer.BTC: Unhandled exception in the indexer, retrying in 0 seconds                                           
System.OperationCanceledException: The peer has been disconnected                                                                
   at NBXplorer.Backends.Postgres.PostgresIndexers.PostgresIndexer.AskNextHeaders() in /source/NBXplorer/Backends/Postgres/PostgresIndexers.cs:line 375
   at NBXplorer.Backends.Postgres.PostgresIndexers.PostgresIndexer.IndexerLoopCore(CancellationToken token) in /source/NBXplorer/Backends/Postgres/PostgresIndexers.cs:line 193
   at NBXplorer.Backends.Postgres.PostgresIndexers.PostgresIndexer.IndexerLoopCore(CancellationToken token) in /source/NBXplorer/Backends/Postgres/PostgresIndexers.cs:line 202
   at NBXplorer.Backends.Postgres.PostgresIndexers.PostgresIndexer.IndexerLoopCore(CancellationToken token) in /source/NBXplorer/Backends/Postgres/PostgresIndexers.cs:line 127
   at NBXplorer.Backends.Postgres.PostgresIndexers.PostgresIndexer.IndexerLoop() in /source/NBXplorer/Backends/Postgres/PostgresIndexers.cs:line 103
info: NBXplorer.Indexer.BTC: TCP Connection succeed, handshaking...                                                              
info: NBXplorer.Indexer.BTC: Handshaked                         
info: NBXplorer.Indexer.BTC: Testing RPC connection to http://bitcoin:18443/                                                     
info: NBXplorer.Indexer.BTC: RPC connection successful          
info: NBXplorer.Indexer.BTC: Full node version detected: 260000                                                                  
info: NBXplorer.Indexer.BTC: Has txindex support                
info: NBXplorer.Indexer.BTC: NBXplorer is correctly whitelisted by the node                                                      
info: NBXplorer.Indexer.BTC: BTC: RPC Wallet loaded             
info: NBXplorer.Events: BTC: Node state changed: NotStarted => NBXplorerSynching                                                 
info: NBXplorer.Events: BTC: Node state changed: NBXplorerSynching => Ready                                                      
info: NBXplorer.Events: BTC: New block 1cd3f8bb3c9bcf0734d62aeaa3af609d701e2285b30e5be212648b4c91dd3ebc (117)                    
info: NBXplorer.Events: BTC: tpubDDZr51...Uys3NCyaaCL7wwPrpfE7 matching confirmed transaction 2cf861...1a61af (0/0)              
info: NBXplorer.Events: BTC: New block 7dca33b0933e89589b01e0069abed0824ef4531d947a42a20fa86027fabec597 (118)                    
info: NBXplorer.Events: BTC: New block 1cddffbe1a1fa0a263f8cb117c94eb54c02985f0ad30fae3e76e36b687ae1605 (119)                    
info: NBXplorer.Events: BTC: New block 0d8b0007742fe46167a0f4cb703c42aeb194ebc5e051ef1112b18720b0fab982 (117)                    
info: NBXplorer.Events: BTC: tpubDDZr51...Uys3NCyaaCL7wwPrpfE7 matching confirmed transaction 2cf861...1a61af (0/0)              
info: NBXplorer.Events: BTC: New block 24c9c4e04c598096dbc6ded57702f6d86a102266374c7fdb8d3c14e62fcf220e (118)              
info: NBXplorer.Events: BTC: New block 742b620cd0583a1714931c326a8e6c65d3b86319c4d29a94aefbe5e9dfbe8ee5 (119)                    
info: NBXplorer.Events: BTC: New block 2edfc95a58a1396b1543aaf67ac1d1a0ae29d38ddd07ececb079416c168c5de9 (120)                    
info: NBXplorer.Events: BTC: New block 5679644646d74e5425da79e3ac3576ec3e82d0d00ff25390f091a5a9ffca90cf (121)                    
info: NBXplorer.Events: BTC: tpubDCTjVF...zAuCHEQkzSg6ST5wFtU8 matching unconfirmed transaction 597e6d...1fb774 (0/0)            
info: NBXplorer.Events: BTC: New block 6b0413ad279003b57735e8451038d6b98257fefe7cde2117ef4c50606bfe666f (122)                    
info: NBXplorer.Events: BTC: tpubDCTjVF...zAuCHEQkzSg6ST5wFtU8 matching confirmed transaction 597e6d...1fb774 (0/0)              
warn: NBXplorer.Indexer.BTC: Node disconnected for reason: Not sending headers first anymore                                     
info: NBXplorer.Events: BTC: Node state changed: Ready => NotStarted 

And then it just hangs there.

The same could happen on nbx 2.4.1 too, but it would resolve after one minute of stalling and then pick up the syncing again, whereas in 2.5.0 it hangs seemingly forever.

Here's my nbx environment variables:

      NBXPLORER_BIND: "0.0.0.0:32832"
      NBXPLORER_NOAUTH: "1"
      NBXPLORER_NETWORK: regtest
      NBXPLORER_CHAINS: btc
      NBXPLORER_EXPOSERPC: "1"
      NBXPLORER_POSTGRES: "User ID=postgres;Password=pass;Host=postgres;Port=5432;Database=nbxplorer;Pooling=true"
      NBXPLORER_MAXGAPSIZE: "5"
      NBXPLORER_MINGAPSIZE: "3"
      NBXPLORER_BTCRPCURL: "http://bitcoin:18443"
      NBXPLORER_BTCRPCUSER: test
      NBXPLORER_BTCRPCPASSWORD: password
      NBXPLORER_BTCNODEENDPOINT: "bitcoin:18444"
      NBXPLORER_NOWARMUP: "1"

And my bitcoin core config:

        chain=regtest
        listen=1
        fallbackfee=0.00001000
        maxtxfee=20000000
        txindex=1
        [email protected]/0
        rpcuser=test
        rpcpassword=password
        rpcallowip=0.0.0.0/0
        zmqpubrawblock=tcp://0.0.0.0:18501
        zmqpubrawtx=tcp://0.0.0.0:18502
        [regtest]
        rpcbind=0.0.0.0
@kallerosenbaum
Copy link
Author

Some more info. This happens during reorgs. I have a test that does the following:

  1. Send a tx to a tracked derivation scheme
  2. Build 4 blocks (B1'-B4')
  3. await >=1 confirmation for the tx
  4. invalidateblock B1'
  5. Build 3 blocks (B1-B3)
  6. reconsiderblock B1'
  7. await >=1 confirmation for the tx

Step 6 is to simulate a reorg from B1'-B3' to B1-B4.

@kallerosenbaum
Copy link
Author

This is probably a duplicate of #409

@NicolasDorier
Copy link
Member

NicolasDorier commented Mar 6, 2024

I can reproduce with your instructions.

Two issues at play here:

  1. Why the exception?
  2. Why the watchdog doesn't reconnect automatically the node.

Since it is easy to reproduce, I should be able to fix it today.

@kallerosenbaum
Copy link
Author

I get the same functional behaviour when step 2 builds 2 blocks and step 5 builds 1 block (ie nbx seemingly stops syncing), but the messages:

warn: NBXplorer.Indexer.BTC: Node disconnected for reason: Not sending headers first anymore                                     
info: NBXplorer.Events: BTC: Node state changed: Ready => NotStarted 

won't appear.

It looks like this.

...
info: NBXplorer.Indexer.BTC: RPC connection successful
info: NBXplorer.Indexer.BTC: Full node version detected: 260000
info: NBXplorer.Indexer.BTC: Has txindex support
info: NBXplorer.Indexer.BTC: NBXplorer is correctly whitelisted by the node
info: NBXplorer.Events: BTC: Node state changed: NotStarted => NBXplorerSynching
info: NBXplorer.Events: BTC: Node state changed: NBXplorerSynching => Ready
info: NBXplorer.Events: BTC: New block 2e5c1f80169172c131d7daf628f56c22fdc18874f76e427fa422be118d8dfac1 (243)
info: NBXplorer.Events: BTC: tpubDCFf12...3rLg2kbGK6ALBRLK9xvs matching confirmed transaction f903ac...a38620 (0/0)
info: NBXplorer.Events: BTC: New block 16146c597f7f39229eb614b66c898f4bc1728b27f2c65a4bd30e34a53e39baa7 (244)
info: NBXplorer.Events: BTC: tpubDDVNva...dmcs4S9p8LiL9hP1BQeZ matching unconfirmed transaction fb101c...2b437a (0/0)
info: NBXplorer.Events: BTC: New block 1da1f8fdb5d6021cf05d90349f52d289ee23ec55967aa8f5d3d7ec2037e6260c (245)
info: NBXplorer.Events: BTC: tpubDDVNva...dmcs4S9p8LiL9hP1BQeZ matching confirmed transaction fb101c...2b437a (0/0)
info: NBXplorer.Events: BTC: New block 67a1225e0e2d39c4b4cfb05206cae3376749e11254a875cd4bcc12bea7777c56 (246)

It hangs here.

@NicolasDorier
Copy link
Member

fixed in v2.5.1

@kallerosenbaum
Copy link
Author

2.5.1 certainly fixes this issue. Great. But thinking more about it, it's just when we do invalidateblock before NBX is aware of B4' that this happens. If we sleep 100ms between steps 3 and 4 (in #461 (comment)) this issue doesn't appear, neither on 2.5.0 nor 2.5.1, and we avoid the annoying 5min stall.

So it might be that this was strictly an academic issue that will never occur in real world, in which case I'm sorry for wasting your time.

@kallerosenbaum
Copy link
Author

Also, this was not a duplicate of #409 so I'm reopening that one. But feel free to close it if you think that it shouldn't be fixed or that it's not a bug.

@NicolasDorier
Copy link
Member

NicolasDorier commented Mar 7, 2024

@kallerosenbaum yes the "Not sending headers first anymore" is if we don't do the sleep, I noticed that. For some reason nodes send INV blocks if they detect a reorg in the getheaders (and that there is no sleep). The tricky thing is that the INV announced are actually blocks that aren't in the chain anymore. I believe this is a bug from bitcoin core to be honest. (I have detected this condition only on doge coin in the past, because I guess they have more reorg)

However, this PR does two things:

  1. Ignore Blocks INV instead of disconnecting
  2. Fix: When a node disconnected during a block download, NBX would be stuck forever

The annoying 5min stall is really an unlucky timing issue, and I don't think there is an easy way to detect this condition. (I have some ideas, but implementing them requires me to write some very tricky code that isn't easily testable, which may contains bugs that are more annoying than the stall)

The 2. point can actually happen in other situations... if the block download is stalling, but also if you decide to restart the server during a block download, then NBX would fail to stop properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants