-
Notifications
You must be signed in to change notification settings - Fork 128
Excessive memory consumption when syncing a long way up to the canonical head
#3207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Ops, was the wrong one -- lol |
was somehow related to issue 3202 :) |
fixed by #3204 |
Looks like the the problem not cured thoroughly. Need more investigation. |
has the memory usage improved than before or exactly same as before ? |
When syncing with hoodi using an empty database, initially everything looks ok, the base can move forward. But when the gap is wider, the base stop moving. I'm not sure why CL suddenly request sync from head < 10K, then jump to > 200K+. The problem is syncer downloading forward from known FC base(even though the segment request is reverse), but FC expecting the syncer to download backward from head. Of course the FC expectation not satisfied by the syncer because the finalized hash(pendingFCU) not resolved into IIRC from discord discussion we agree on the syncer have two phase:
That is what assume how the syncer works. But looks like not like that. |
I observed the same in general although there was an outlier on |
That is exactly how it works apart from the fact that the |
EL=nimbus Both FC and syncer expect CL give finalized hash The above sync sessions happen when I sync with hoodi. The question is, why CL send a finalized hash far from target? Considering this fact, the syncer cannot just ignore the finalized block if CL behave like this. |
Do you have the actual FCs the CL is sending? |
The body download starts with a block number where the header has a parent on the This state (that the collected chain has a My take was that the syncer should (and does) neither know nor care about the finalised hash and its block header resolution. |
To add, in general, the CLs will send fCUs corresponding to whatever they think the current (head, safe, finalized) EL blocks are. They don't, per, se, have a notion of "target". |
The name |
Yeah, I understand. But in general, in a well-functioning network, the (head, safe, finalized) epochs in fCU are usually (not always) ( Is that being seen here? |
Here what is being seen: H=Head, B=Base,F=Finalized Few early sessions/short session: Then CL will send very long session: During this long session, CL will gradually update F forward with random steps. Then around F=77K, the CL stop updating F. If CL keeps updating F, we can formulate a strategy. But because it stop updating, the excessive memory consumption will always repeat. don't know how other CL behave. |
If you look at the CL logs (e.g., look at the |
Looks like CL send finalized hash depends on the progress of CL sync. CL-F epoch=2424, CL-H epoch=9341, progress=25,97%. If CL sees EL already sync past it's own progress, it will stop sends new H and F. I want to propose changes to EL syncer:
The reason for this complication is to keep the CL sends new F without EL progressing too much beyond CL progress percentage. But there is one problem, should D be calculated dynamically, or it is a constant?
Note:
|
this is the epoch number, ie 2424*32 = slot 77568 and the head in this log is at 77642 - there is no (significant) gap. the |
this is not where the issue lies, generally .. ie something else is preventing finalized from being updated. There's no reason for the CL to "hold back" finalized updates, but more broadly, the proposed algorithm wouldn't work when the chain is not finalizing - without finality, the gap between H and F is expected to grow (and we'll solve that by keeping block bodies on disk also for non-final blocks). |
It's because of the LC -- the LC gets head but (correctly) doesn't update finalized. This isn't a bug, it's by design. The EL sync should handle it properly. |
That where the problem is. CL sends FCU to EL:
And this create a huge gap in EL. EL knows nothing about CL "synced head". The algorithm will works. If there is no finality, the "B-Syncer" will do nothing, it will keep waiting for a valid F from CL. |
Since PR #3191 the Nimbus EL has an annoying memory problem in the
FC
module as the syncer does not updatebase
anymore while importing blocks. This happens at least when the syncer has to catch up a long way.Previously, there was a kludge related to the syncer which used the
forkChoice()
function forbase
update.Now
base
can only be updated if the CL triggers aforkChoiceUpdated
which has no effect if the update is out of scope for theFC
module, which in turn happens when syncing for an old or pristine database state.In fact, this leads to a similar situation to when
mainnet
was unable to finalise transactions globally.For the attached screenshot, I ran the syncer overnight (with turned off CL) and had the following memory usage in the morning
As it seems, a big machine can handle the situation to an extend but the execution throughput decreases.
The text was updated successfully, but these errors were encountered: