Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M1 DA Light Node #33

Merged
merged 71 commits into from
Apr 25, 2024
Merged

M1 DA Light Node #33

merged 71 commits into from
Apr 25, 2024

Conversation

l-monninger
Copy link
Collaborator

@l-monninger l-monninger commented Apr 17, 2024

Summary

Note: many of the subdirs have READMEs. If you are confused, double-check for one.

Implements RFC M2 DA Light Node [sic].

Core Software Components

Ancillary Software Components

Testing

e2e and integration

Because of the difficulty of dependency management, particularly for Celestia, I recommend running:

nix develop
just m1-da-light-node test.local

This will run e2e and integration tests for the light-node implementation.

If you want to play around with tests yourself, you can use

. ./scripts/celestia-env

to populate an environment which will be amenable to your testing.

You will still likely want to simply run the m1-da-light-node services the background or from another process.

nix develop
just m1-da-light-node local

unit

cargo tests should run successfully without additional orchestration for all crates besides m1-da-light-node-verifier, m1-da-light-node, and m1-da-light-client.

Outstanding Issues

  • celestia-node struggles to deal with multiple attempted ws connections, particularly from the same process. Until this is resolved, cargo-based e2e and integration tests should be run with -- --test-threads=1.
  • Integration work with ENG-187 Extract Aptos execution module for Monza Partial #30 is likely to alter this PR slightly.
  • cel-key may benefit from being Nixified separately from celestia-node.
  • The CI has not been updated.
  • The DA verification logic will occasionally (1/20x) fail. This logic is pulled directly from the Lumia implementation and likely simply reflects network latency or issues with celestia-node. A retry period may be worth implementation.
  • anyhow::Error is used liberally. Custom error types may better serve Typed Error Propagation in the future.
  • Documentation will be enhanced as requested.
  • 6,000 LOC, sorry.

@l-monninger
Copy link
Collaborator Author

@0xmovses it ended up being quite the little odyssey to fix the Nix Flake. In the end, I just abandoned the Makefile and built the go modules directly. It should work now.

service LightNodeService {
// Stream blobs from a specified height or from the latest height.
rpc StreamReadFromHeight (StreamReadFromHeightRequest) returns (stream StreamReadFromHeightResponse);
rpc StreamReadLatest (Empty) returns (stream StreamReadLatestResponse);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More compliance needed:

Suggested change
rpc StreamReadLatest (Empty) returns (stream StreamReadLatestResponse);
rpc StreamReadLatest (StreamReadLatestRequest) returns (stream StreamReadLatestResponse);

and StreamReadLatestRequest defined as an empty message.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, for some reason I thought that was canonical. Will change.

@@ -0,0 +1,102 @@
syntax = "proto3";
package movementlabs.protocol_units.da.m1.light_node.v1beta1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sufficiently ramified now. The proto looks good in general.

Comment on lines +19 to +21
message BlobWrite {
bytes data = 1;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaves room for future extensions, I assume.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, some changes may already be rolling in.

@0xmovses
Copy link
Contributor

0xmovses commented Apr 19, 2024

nix develop successfully builds the shell env now. When I run cargo test after nix develop I get:

running 2 tests
test test::e2e::test_submit_and_read ... FAILED
test test::e2e::test_light_node_submits_blob_over_stream ... FAILED

failures:

---- test::e2e::test_submit_and_read stdout ----
Error: transport error

Caused by:
    0: error trying to connect: tcp connect error: Connection refused (os error 111)
    1: tcp connect error: Connection refused (os error 111)
    2: Connection refused (os error 111)

---- test::e2e::test_light_node_submits_blob_over_stream stdout ----
Error: transport error

Caused by:
    0: error trying to connect: tcp connect error: Connection refused (os error 111)
    1: tcp connect error: Connection refused (os error 111)
    2: Connection refused (os error 111)


failures:
    test::e2e::test_light_node_submits_blob_over_stream
    test::e2e::test_submit_and_read

test result: FAILED. 0 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

@l-monninger
Copy link
Collaborator Author

nix develop successfully builds the shell env now. When I run cargo test after nix develop I get:

running 2 tests
test test::e2e::test_submit_and_read ... FAILED
test test::e2e::test_light_node_submits_blob_over_stream ... FAILED

failures:

---- test::e2e::test_submit_and_read stdout ----
Error: transport error

Caused by:
    0: error trying to connect: tcp connect error: Connection refused (os error 111)
    1: tcp connect error: Connection refused (os error 111)
    2: Connection refused (os error 111)

---- test::e2e::test_light_node_submits_blob_over_stream stdout ----
Error: transport error

Caused by:
    0: error trying to connect: tcp connect error: Connection refused (os error 111)
    1: tcp connect error: Connection refused (os error 111)
    2: Connection refused (os error 111)


failures:
    test::e2e::test_light_node_submits_blob_over_stream
    test::e2e::test_submit_and_read

test result: FAILED. 0 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

You can't run all the tests directly rn. Hence...

nix develop
just m1-da-light-node-test-native

If we want to make this a requirement for the org, I will look into add startup hooks for the e2e and integration tests.

@l-monninger
Copy link
Collaborator Author

If there's any funny business with a Celestia account not being available, the fund-account process is supposed to handle that. I will need to fix any issue for CI/CD. But, for testing, you can also just manually fund your account here: https://faucet.celestia-arabica-11.com/ (it should error with the account address).

@0xmovses
Copy link
Contributor

Hmm still get an error

just m1-da-light-node-test-native

scripts/movement/test-native m1-da-light-node
Using sequencer storage path: /tmp/nix-shell.oKl3sA/m2-UBri4hRF
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6704    0  6704    0     0  64076      0 --:--:-- --:--:-- --:--:-- 64461
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6704    0  6704    0     0  70384      0 --:--:-- --:--:-- --:--:-- 70568
{"level":"warn","error":"open /home/movses/.config/process-compose/settings.yaml: no such file or directory","time":"2024-04-19T15:35:27+01:00","message":"Error reading settings file /home/movses/.config/process-compose/settings.yaml"}
Cleaning up storage: /tmp/nix-shell.oKl3sA/m2-UBri4hRF
removed directory '/tmp/nix-shell.oKl3sA/m
2-UBri4hRF'
error: Recipe `m1-da-light-node-test-native` failed on line 2 with exit code 255

Snapshot_2024-04-19_15-33-51

@l-monninger
Copy link
Collaborator Author

That's an error in the fund account step. Though it's a little weird that it reports null as the value for you.

Anyways, you can drop that step by commenting out here for the moment:

Then you'll get another error complaining about your Celestia account not existing. You should be able to fix that with the manual step mentioned above.

I will see if I can reproduce.

@l-monninger
Copy link
Collaborator Author

Actually, the later processes I think are only dependent on the "completed" status for now. So, you should be able to do this without commenting out that step and just doing the manual funding.

@0xmovses
Copy link
Contributor

Totally happy to manually test this with steps. But if its not massive amounts of trouble / effort (I'll let you decide), I think it would be beneficial to automate these steps, so that anyone committing code to the repo can run these tests and get some assurance about regressions.

@l-monninger
Copy link
Collaborator Author

That's the idea. I'm suggesting this for now, because I don't know what's happening on your machine.

@l-monninger
Copy link
Collaborator Author

It's just the fund account step that should be manual for you will that issue with the fund-account process persists.

@0xmovses
Copy link
Contributor

I ran celestia light init, used the account address to fund my account here https://faucet.celestia-arabica-11.com/

Then when running the test again I get this

movses@movses-ubuntu:~/mvmt/sdk$ just m1-da-light-node-test-native
scripts/movement/test-native m1-da-light-node
Using sequencer storage path: /tmp/nix-shell.VyjDnl/m2-mMPJ1Rod
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2271    0  2271    0     0  16066      0 --:--:-- --:--:-- --:--:-- 16106
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2271    0  2271    0     0  24795      0 --:--:-- --:--:-- --:--:-- 24956
{"level":"warn","error":"open /home/movses/.config/process-compose/settings.yaml: no such file or directory","time":"2024-04-19T16:13:34+01:00","message":"Error reading settings file /home/movses/.config/proces
s-compose/settings.yaml"}
24-04-19 16:13:34.272 FTL Failed to load project error="circular dependency found in 'm1-da-light-node-verifier-tests'"
Cleaning up storage: /tmp/nix-shell.VyjDnl/m2-mMPJ1Rod
removed directory '/tmp/nix-shell.VyjDnl/m2-mMPJ1Rod'
error: Recipe `m1-da-light-node-test-native` failed on line 2 with exit code 1

Am I missing another step?

@l-monninger
Copy link
Collaborator Author

I think I've stabilized this. As I mentioned in Slack, circular dependency isn't something I had seen. However, when I refactored my script, I saw it pop up once when the the .yaml was just totally wack haha. So, maybe there's some whitespace issue here.

Otherwise, this should work. I am going to add different test network modes though because state syncing takes a while. Generally, in CI/CD, we I think we should run the following.

  1. Test against localized Celestia. This is the fastest. If tests don't pass, consider incorrect.
  2. Test against Mocha after localized passes. If tests don't pass, consider incorrect.
  3. Test against Arabica after localized passes. If tests don't pass, consider warning for future implementations. (Arabica is a preview.)

@l-monninger
Copy link
Collaborator Author

l-monninger commented Apr 20, 2024

The latest commit provides a local runner for Celestia. Check the latest docs. Will add CI in a bit. @0xmovses

@0xmovses
Copy link
Contributor

:shipit:

@l-monninger l-monninger removed the request for review from andyjsbell April 22, 2024 17:45
@l-monninger l-monninger merged commit f54f0c5 into monza Apr 25, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants