-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
peer, main, netsync, blockchain: parallel block downloads #2226
base: master
Are you sure you want to change the base?
peer, main, netsync, blockchain: parallel block downloads #2226
Conversation
Pull Request Test Coverage Report for Build 12081011051Details
💛 - Coveralls |
Seems like the slowdowns are coming from a single peer that's slowing down the block processing. |
Interesting, is the issue that the single peer is assigned blocks uniformly and is always the last one to send (haven't looked in PR at detail yet). I have this tracking issue in the |
c51d31a
to
2cbb562
Compare
cc: @Crypt-iQ @ProofOfKeags for review |
8f28947
to
06209a4
Compare
06209a4
to
d5b78aa
Compare
candidate Since we can use all the peers we could get for ibd, don't add peers that are not sync candidates when we're still not current.
query.Peer is used for downloading blocks out of order during headers first download. Methods SubscribeRecvMsg() and OnDisconnect() are added to abide by the interface.
ConnectedPeers returns all the currently connected peers. This is used to provide the query.WorkManager with all the currently connected peers from the netsync package.
handleBlockMsg used to check that the block header is both valid and then process the blocks as they come in. It's now refactored so that it also handles blocks that are not in order. For out of order block downloads handleBlockMsg would mark the block as an orphan but it's now refactored to handle those cases. Whenever a block that's not the next from the chain tip is received, it's now temporarily stored in memory until the next block from the chain tip is received. And then all the blocks that are in sequence are processed.
peerDisconnectMsg is added so that we can access the peerStates map and disconnect peers with just a string of their address without risking a concurrent access of the map.
checkpointedBlocksQuery is a helper to create []*query.Request which can be passed off to query.Workmanager to query for wire.Messages to multiple peers. This is useful for downloading blocks out of order from multiple peers during ibd.
peerSubscription is added to Manager which will allow it subscribers to receive peers through the channel whenever the Manager is aware of a new peer that it's been connected to. This is useful to alert query.Workmanager that a new peer that's been connected to is eligible to download blocks from.
ConnectedPeers returns all the currently connected peers and any new peer that's additionally connected through the returned channel. This method is required for query.Workmanager as it needs ot receive peers that it can request blocks from.
The blocks that were requested from headers are now sent over to query.Workmanager where it will rank peers based on their speed and request blocks from them accordingly. This allows for quicker block downloads as: 1: Workmanager will prioritize faster peers. 2: Workmanager is able to ask from multiple peers.
Resetting the requestedBlocks state in headersFirst is problematic since we may be banning peers that are still good.
d5b78aa
to
dd160eb
Compare
Pushed all the code that I have and updated the original comment with extra information |
This PR modifiesnetsync.Manager
so that all the header-first blocks downloaded before the last checkpoint is done out of order by utilizingquery.WorkManager
from neutrino.Gonna put it in draft for now as testing is sorta difficult and I'm not convinced it's downloading blocks fasuter for mainnet. By my testing it works just fine in testnet but mainnet seems to be slow when downloading blocks. Still identifying where the bottleneck is and will make adjustments accordingly.If anyone else would like to give this a try please let me know if you see speed ups or slow downs from this PR.Parallel block download architecture
Block Download Window
We mainly follow the "block download window" model that bitcoin core is using. This model
allows the blocks to be downloaded out of order inside the given window.
In the given blocks below from block 01 to 15, if the window size is 4, the download sequence
might look like so:
Any one of these blocks can be downloaded in parallel and these blocks are then queued to multiple
peers. Once b01 is downloaded, the block window moves forward, resulting in something that looks
like this:
If b02 was already downloaded before we've downloaded b01, then the block window will shift by 2,
resulting in something that looks like so:
The parameters used in the code:
maxBlockDownloadWindow = 1024
GetData size per peer
When requesting blocks from peers, we'll use a getdata message. This can have a maximum of 50,000 block hashes.
If we ask for more blocks from a single peer, we can reduce the latency since we only send a single getdata message
for many blocks. However, this may also result in a single peer slowing us down as that peer may be a slow peer.
By playing around with the paramter, I came up with 32 max blocks per peer. This results in insignificant slowdown
for a single fast peer while still being small enough to not be slowed down too much by a single peer.
The parameters used are:
maxInFlightBlocksPerPeer = 32
Peer selection
Currently there's not an explicit peer selection logic in btcd itself. There's a PR on the neutrino side that allows
btcd to disconnect off of peers when they are not able to deliver a block before the timeout of 2 seconds.
lightninglabs/neutrino#308
I've tested different types of peer selection methods such as:
1: disconnecting peers when they're timing out.
2: disconnecting peers when they're slower than a certain speed.
3: a comparative system where the peer that delievers the block slowest 2 consecutives times in a given window is disconnected.
and tried these with various tweaks to them as well but the underlying problem seems to be that we're just not connecting to
fast enough peers.
The peer selection logic only eliminates us of slow peers. If we're connected to 8 (the default amount of peers) slow peers,
then the IBD is slow no matter what. I thought this was a problem with btcd itself but testing on the latest version of Bitcoin
core showed that it's just as slow as well.
My conclusion
I believe the current PR that I have is the simplest way of achieving ibd with parallel block downloads.
Since it's not any slower than the latest version of Bitcoin Core, I think it's worthwhile to merge what
we have now and in follow up PRs make changes to the peer selection logic from the connmanager side.