Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Holesky Rescue [INFORMATIONAL] #7040

Open
paulhauner opened this issue Feb 26, 2025 · 4 comments
Open

Holesky Rescue [INFORMATIONAL] #7040

paulhauner opened this issue Feb 26, 2025 · 4 comments

Comments

@paulhauner
Copy link
Member

paulhauner commented Feb 26, 2025

Holesky Rescue - Feb 2025

This issue acts as a "front page" for the rescue efforts on Holesky. Holesky failed the Electra upgrade due to a config issue in some ELs.

The config issue in Nethermind, Geth & Besu resulted in an invalid block being justified. This means that for some users it will be impossible to start attesting on the valid chain without being slashed. Lighthouse will natively protect against this slashing; if you are affected your VC will log errors about slashing protection but you will not be slashed.

The community is still developing a unified approach to solving the slashing issue, but it is a strong possibility that these validators are destined to be slashed or leaked out. Even if your validators cannot attest, they can still produce blocks which is very valuable. We encourage all Holesky validators to try and get their setups online and functioning.

Lighthouse Advice

If you're a Holesky validator, we recommend you use the v7.0.0-beta.1 release.

There are some VC flags that may be relevant to your scenario:

  • --beacon-nodes-sync-tolerances 1000,2000,3000
    • This flag may no longer be necessary due to recent improvements in block frequency.
  • --disable-attesting
    • Only run this flag if you are struggling to sync a CL and EL pair and/or running out of memory. Once you are synced, remove this flag to allow your validator to attest to the canonical chain.

No additional flags are required for the BN.

Disabling Slashing Protection

DO NOT DO THIS UNTIL INSTRUCTED TO DO SO

If your validators won't attest due to CRIT errors from the slashing protection database, you should disable slashing protection in coordination with the rest of the validator set (monitor Discord/Telegram). To disable slashing protection in Lighthouse:

  1. Stop the VC. Delete the slashing_protection.sqlite file located in $datadir/validators. You should also delete slashing_protection.sqlite-journal.
  2. Start the VC again, using the flag --init-slashing-protection. The flag is required to force Lighthouse to start in the absence of a slashing protection DB. A new slashing protection DB will be created, but will not contain any of the historic information that would block your validators from signing.

Once slashing protection is disabled, your validators will most likely get slashed. Our hope is that they contribute to finalizing Holesky before doing so (hence the importance of a coordinated slashing).

EL Client Advice

  • Make sure your EL is running a patched version (not relevant for Reth & Erigon, which do not have a config bug).
  • If your EL was affected by the config issue, you may need to drop the EL database and resync.
  • Reach out to your EL team for specific info.
@paulhauner
Copy link
Member Author

We've just released v7.0.0-beta.1 which bans the invalid block and has some fixes to help sync. We're running this on our validators and have 100k validators online.

https://github.com/sigp/lighthouse/releases/tag/v7.0.0-beta.1

@chong-he
Copy link
Member

chong-he commented Feb 26, 2025

If you are syncing Holesky, you may experience high memory usage. Tips to reduce memory usage:

  • remove all flags that is not required for sync, such as --validator-monitor-* (e.g., --validator-minitor-auto), --gui on the beacon node. In short, use only necessary flags on the BN:
  --network holesky \
  --execution-endpoint http://localhost:8551 \
  --execution-jwt /secrets/jwt.hex \
  --checkpoint-sync-url url \
  --http
  • temporarily shut down the VC when the BN is syncing. (Thanks Paul for the tips). While syncing, the validators can't attest anyway, but shutting down the VC will stop it to sending request to BN which is already overwhelmed.

Resync Lighthouse with checkpoint sync if using EL that was affected by the issue

If you are using EL that was affected by the issue, chances are the Lighthouse database has been corrupted with the invalid blocks/forks. In this case, delete the Lighthouse database and start a fresh checkpoint sync. An up-to-date checkpoint sync url is given here with the checkpoint-sync-url as: https://checkpoint-sync.holesky.ethpandaops.io/

@MrKoberman
Copy link

MrKoberman commented Feb 27, 2025

I have developed a health and readiness probes for EL and CL which might come in handy. They can be found here https://github.com/mysteryForge/eth-kit. If you are using k8s I would recommend to clone the conf from here for CL and adjust for your setup.

https://github.com/MysteryForge/eth-infra/blob/main/modules/k8s-geth-lighthouse/statefulset.tf#L154-L225

@chong-he
Copy link
Member

chong-he commented Feb 28, 2025

Latest info as of Feb 28 at ~04:00:00 UTC: (the time of this comment posted)

There will be a coordinated slashing that will be happening at slot 3737760 (Feb 28, 15:12:00 UTC). If you have a Holesky node with validators, kindly proceed with the instruction at the top post about how to Disabling Slashing Protection.

If you have already done so before, no further action required, other than using the latest version of clients and getting the node online.

If you have not disabled slashing protection, you may now proceed to delete the slashing protection database by following the steps in the top post. Then, run the node as usual.

Additional info (got from here):

  • The up-to-date Holesky explorer for the correct chain is: https://dora-holesky.pk910.de/
  • You can check if your node is on the correct chain by running the script: https://gist.github.com/samcm/e2da294dab77e93ad0ee0e815580294f
  • If your node has difficulty finding peers, you can add some bootnodes here: https://hackmd.io/@_iAz6KERTsWIHHNF-wMxAA/r1XlYyickx. To add bootnodes, use the flag --boot-nodes on the beacon node. For example:
    --boot-nodes enr:-PW4QAOnzqnCuwuNNrUEXebSD3MFMOe-9NApsb8UkAQK-MquYtUhj35Ksz4EWcmdB0Cmj43bGBJJEpt9fYMAg1vOHXobh2F0dG5ldHOIAAAYAAAAAACGY2xpZW502IpMaWdodGhvdXNljDcuMC4wLWJldGEuMIRldGgykAGeIa0GAXAA__________-CaWSCdjSCaXCEff1tSYRxdWljgiMphXF1aWM2giMpiXNlY3AyNTZrMaECUiAFSBathSIPGhDHbZjQS5gTqaPcRkAe4HECCk-vt6KIc3luY25ldHMPg3RjcIIjKIR0Y3A2giMog3VkcIIjKA
    (yes, it's long and it's normal)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants