Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Node] Add retry on DA connection #1054

Merged
merged 3 commits into from
Feb 25, 2025

Conversation

musitdev
Copy link
Contributor

@musitdev musitdev commented Feb 13, 2025

Summary

  • RFCs: Link to RFC, Link to RFC, or $\emptyset$.
  • Categories: any of protocol-units, networks, scripts, util, cicd, or misc.

Sometime during the docker container start, the full node start too early and the DA light node didn't have opened its entry point. It results in an error during the full node start, then a restart of the whole stack.
This scenario can repeat several times until the timing is correct.
This PR add a retry for this connection so that full node start is less sensible to timing issues during the start.

I've changed the error message when the config can't be parsed.

Changelog

  • Ad a retry when the Http2 connection fails during init.
  • add a better error message when the config can't be parsed

Testing

To test, you can start a leader and its follower:

CELESTIA_LOG_LEVEL=FATAL nix develop --extra-experimental-features nix-command --extra-experimental-features flakes --command bash  -c "just movement-full-node native build.setup.eth-local.celestia-local.test-followers --keep-tui"

The test execution should have no errors.

Outstanding issues

@0xmovses
Copy link
Contributor

Thanks for the update on the error message, that is greatly needed.

Copy link
Contributor

@andygolay andygolay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the test and it passed; if the consensus is that the retry logic is needed then I'm willing to approve. Once concern is to have a "service available" condition in the orchestration to prevent this problem that causes the need for retry logic... maybe this could be something to add in a next iteration.

@mzabaluev
Copy link
Collaborator

Please fix the test workflow to not reference a nonexistent package.

@musitdev
Copy link
Contributor Author

Fixed, there's some test that still fail like executor::execution::tests::test_execute_block_state_get_api but it's not related to this PR.

@mzabaluev
Copy link
Collaborator

Yes, I see the same failure on #1038, let's fix it separately.

@radupopa369 radupopa369 merged commit 50612de into main Feb 25, 2025
116 of 123 checks passed
mcmillennick pushed a commit that referenced this pull request Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants