peer, main, netsync, blockchain: parallel block downloads #2226

kcalvinalvin · 2024-08-07T11:35:09Z

~~This PR modifies netsync.Manager so that all the header-first blocks downloaded before the last checkpoint is done out of order by utilizing query.WorkManager from neutrino.~~

Gonna put it in draft for now as testing is sorta difficult and I'm not convinced it's downloading blocks fasuter for mainnet. By my testing it works just fine in testnet but mainnet seems to be slow when downloading blocks. Still identifying where the bottleneck is and will make adjustments accordingly.

~~If anyone else would like to give this a try please let me know if you see speed ups or slow downs from this PR.~~

Parallel block download architecture

Block Download Window

We mainly follow the "block download window" model that bitcoin core is using. This model
allows the blocks to be downloaded out of order inside the given window.

In the given blocks below from block 01 to 15, if the window size is 4, the download sequence
might look like so:

     window
       |
 -------------
 |           |
b01 b02 b03 b04 b05 b06 b07 b08 b09 b10 b11 b12 b13 b14 b15

Any one of these blocks can be downloaded in parallel and these blocks are then queued to multiple
peers. Once b01 is downloaded, the block window moves forward, resulting in something that looks
like this:

         window
           |
     -------------
     |           |
b01 b02 b03 b04 b05 b06 b07 b08 b09 b10 b11 b12 b13 b14 b15

If b02 was already downloaded before we've downloaded b01, then the block window will shift by 2,
resulting in something that looks like so:

             window
               |
         -------------
         |           |
b01 b02 b03 b04 b05 b06 b07 b08 b09 b10 b11 b12 b13 b14 b15

The parameters used in the code:

maxBlockDownloadWindow = 1024

GetData size per peer

When requesting blocks from peers, we'll use a getdata message. This can have a maximum of 50,000 block hashes.
If we ask for more blocks from a single peer, we can reduce the latency since we only send a single getdata message
for many blocks. However, this may also result in a single peer slowing us down as that peer may be a slow peer.

By playing around with the paramter, I came up with 32 max blocks per peer. This results in insignificant slowdown
for a single fast peer while still being small enough to not be slowed down too much by a single peer.

The parameters used are:

maxInFlightBlocksPerPeer = 32

Peer selection

Currently there's not an explicit peer selection logic in btcd itself. There's a PR on the neutrino side that allows
btcd to disconnect off of peers when they are not able to deliver a block before the timeout of 2 seconds.
lightninglabs/neutrino#308

I've tested different types of peer selection methods such as:

1: disconnecting peers when they're timing out.
2: disconnecting peers when they're slower than a certain speed.
3: a comparative system where the peer that delievers the block slowest 2 consecutives times in a given window is disconnected.

and tried these with various tweaks to them as well but the underlying problem seems to be that we're just not connecting to
fast enough peers.

The peer selection logic only eliminates us of slow peers. If we're connected to 8 (the default amount of peers) slow peers,
then the IBD is slow no matter what. I thought this was a problem with btcd itself but testing on the latest version of Bitcoin
core showed that it's just as slow as well.

My conclusion

I believe the current PR that I have is the simplest way of achieving ibd with parallel block downloads.
Since it's not any slower than the latest version of Bitcoin Core, I think it's worthwhile to merge what
we have now and in follow up PRs make changes to the peer selection logic from the connmanager side.

coveralls · 2024-08-07T11:38:32Z

Pull Request Test Coverage Report for Build 12081011051

Details

13 of 426 (3.05%) changed or added relevant lines in 6 files are covered.
14 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.4%) to 56.866%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
peer/peer.go	2	18	11.11%
server.go	0	18	0.0%
blockchain/accept.go	10	33	30.3%
netsync/blocklogger.go	0	54	0.0%
netsync/manager.go	0	302	0.0%

Files with Coverage Reduction	New Missed Lines	%
peer/peer.go	5	73.37%
netsync/manager.go	9	0.0%

Totals
Change from base Build 11941199528:	-0.4%
Covered Lines:	29895
Relevant Lines:	52571

💛 - Coveralls

kcalvinalvin · 2024-08-08T07:55:39Z

Seems like the slowdowns are coming from a single peer that's slowing down the block processing. peerWorkManager having capabilities to completely disconnect a peer would help in this.

Roasbeef · 2024-08-13T02:06:39Z

Seems like the slowdowns are coming from a single peer that's slowing down the block processing. peerWorkManager having capabilities to completely disconnect a peer would help in this.

Interesting, is the issue that the single peer is assigned blocks uniformly and is always the last one to send (haven't looked in PR at detail yet). I have this tracking issue in the neutrino repo for adding stuff like dynamic tuning, better work assignment, and also work stealing. With work stealing, the faster peers would steal the block request from the work queue of the slow peer, with faster peers helping us to be as slow as the fastest peer.

saubyk · 2024-10-03T15:26:31Z

cc: @Crypt-iQ @ProofOfKeags for review

candidate Since we can use all the peers we could get for ibd, don't add peers that are not sync candidates when we're still not current.

query.Peer is used for downloading blocks out of order during headers first download. Methods SubscribeRecvMsg() and OnDisconnect() are added to abide by the interface.

ConnectedPeers returns all the currently connected peers. This is used to provide the query.WorkManager with all the currently connected peers from the netsync package.

handleBlockMsg used to check that the block header is both valid and then process the blocks as they come in. It's now refactored so that it also handles blocks that are not in order. For out of order block downloads handleBlockMsg would mark the block as an orphan but it's now refactored to handle those cases. Whenever a block that's not the next from the chain tip is received, it's now temporarily stored in memory until the next block from the chain tip is received. And then all the blocks that are in sequence are processed.

peerDisconnectMsg is added so that we can access the peerStates map and disconnect peers with just a string of their address without risking a concurrent access of the map.

checkpointedBlocksQuery is a helper to create []*query.Request which can be passed off to query.Workmanager to query for wire.Messages to multiple peers. This is useful for downloading blocks out of order from multiple peers during ibd.

peerSubscription is added to Manager which will allow it subscribers to receive peers through the channel whenever the Manager is aware of a new peer that it's been connected to. This is useful to alert query.Workmanager that a new peer that's been connected to is eligible to download blocks from.

ConnectedPeers returns all the currently connected peers and any new peer that's additionally connected through the returned channel. This method is required for query.Workmanager as it needs ot receive peers that it can request blocks from.

The blocks that were requested from headers are now sent over to query.Workmanager where it will rank peers based on their speed and request blocks from them accordingly. This allows for quicker block downloads as: 1: Workmanager will prioritize faster peers. 2: Workmanager is able to ask from multiple peers.

Resetting the requestedBlocks state in headersFirst is problematic since we may be banning peers that are still good.

kcalvinalvin · 2025-02-17T05:56:47Z

Pushed all the code that I have and updated the original comment with extra information

kcalvinalvin force-pushed the 2024-04-01-parallel-ibd branch 2 times, most recently from c51d31a to 2cbb562 Compare September 6, 2024 05:51

kcalvinalvin marked this pull request as ready for review September 9, 2024 23:41

kcalvinalvin force-pushed the 2024-04-01-parallel-ibd branch 2 times, most recently from 8f28947 to 06209a4 Compare November 29, 2024 07:56

kcalvinalvin force-pushed the 2024-04-01-parallel-ibd branch from 06209a4 to d5b78aa Compare December 10, 2024 04:18

kcalvinalvin mentioned this pull request Dec 10, 2024

peer: make peer meet query.Peer interface #2287

Open

kcalvinalvin added 11 commits December 10, 2024 16:00

netsync: don't add peer if we're not current and the peer is not a sync

4b9b121

candidate Since we can use all the peers we could get for ibd, don't add peers that are not sync candidates when we're still not current.

peer: make peer meet query.Peer interface

e562196

query.Peer is used for downloading blocks out of order during headers first download. Methods SubscribeRecvMsg() and OnDisconnect() are added to abide by the interface.

main: add ConnectedPeers() to server

91a3414

ConnectedPeers returns all the currently connected peers. This is used to provide the query.WorkManager with all the currently connected peers from the netsync package.

netsync: add peerDisconnectMsg

132aecf

peerDisconnectMsg is added so that we can access the peerStates map and disconnect peers with just a string of their address without risking a concurrent access of the map.

netsync: add checkpointedBlocksQuery

94e1f22

checkpointedBlocksQuery is a helper to create []*query.Request which can be passed off to query.Workmanager to query for wire.Messages to multiple peers. This is useful for downloading blocks out of order from multiple peers during ibd.

netsync, main: add ConnectedPeers to Manager

3d68761

ConnectedPeers returns all the currently connected peers and any new peer that's additionally connected through the returned channel. This method is required for query.Workmanager as it needs ot receive peers that it can request blocks from.

netsync: don't reset the requestedBlocks in headersFirst

720fa5e

Resetting the requestedBlocks state in headersFirst is problematic since we may be banning peers that are still good.

main: include query logging

dd160eb

This was referenced Dec 26, 2024

netsync: add handleBlockMsgHeadersFirst #2292

Open

netsync: add checkpointedBlocksQuery #2293

Open

netsync: add peerSubscription #2294

Open

main: add ConnectedPeers() to server #2295

Open

kcalvinalvin force-pushed the 2024-04-01-parallel-ibd branch from d5b78aa to dd160eb Compare February 15, 2025 12:38

yyforyongyu self-requested a review February 17, 2025 08:43

saubyk assigned kcalvinalvin Feb 18, 2025

saubyk added this to the v0.25 milestone Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

peer, main, netsync, blockchain: parallel block downloads #2226

peer, main, netsync, blockchain: parallel block downloads #2226

kcalvinalvin commented Aug 7, 2024 •

edited

Loading

coveralls commented Aug 7, 2024 •

edited

Loading

kcalvinalvin commented Aug 8, 2024

Roasbeef commented Aug 13, 2024

saubyk commented Oct 3, 2024

kcalvinalvin commented Feb 17, 2025

peer, main, netsync, blockchain: parallel block downloads #2226

Are you sure you want to change the base?

peer, main, netsync, blockchain: parallel block downloads #2226

Conversation

kcalvinalvin commented Aug 7, 2024 • edited Loading

Parallel block download architecture

Block Download Window

GetData size per peer

Peer selection

My conclusion

coveralls commented Aug 7, 2024 • edited Loading

Pull Request Test Coverage Report for Build 12081011051

Details

💛 - Coveralls

kcalvinalvin commented Aug 8, 2024

Roasbeef commented Aug 13, 2024

saubyk commented Oct 3, 2024

kcalvinalvin commented Feb 17, 2025

kcalvinalvin commented Aug 7, 2024 •

edited

Loading

coveralls commented Aug 7, 2024 •

edited

Loading