-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POS efficient l1 polling #2506
base: main
Are you sure you want to change the base?
POS efficient l1 polling #2506
Conversation
Closes ##2491 Begin to add chunked fetching of stake table events.
* try to get tables in `get_stake_tables` * if not available get events from contract
And some cleanup
And have it return `Option`
Pass fetch_stake_tables test
Need to check if the limit settings we have right now make sense. Or if we should switch to using subscriptions. For alchemy the limits are documented here: https://docs.alchemy.com/reference/eth-getlogs |
I think for subscriptions there is no way to subscribe to only finalized L1 events. I might be wrong about this but couldn't find it. So we would have to do some extra work to make sure we don't consider anything that isn't finalized. If we use HTTP then I think it should be possible handle the errors we get when we fetch to many events and fetch less. For that I think we can check what HTTP errors are returned from alchemy and infura (we use both providers at the moment) and handle those. |
Hmm, both seem to return HTTP 200 for too large requests, and not the same error code infura
alchemy
So I wonder if we should try like 5 more times with smaller request sizes as long as we do get a successful response from the RPC and then give up, because it's then likely not the response size that is the problem. |
Number's aren't quite lining up yet
state.stake_table_initial_block = init_block; | ||
init_block | ||
} else { | ||
tracing::error!("Failed to retrieve initial block from stake-table contract"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale for actually handling the case where we don't have the init block from the stake table? I think we can't handle this case correctly anyway because it means that the stake table contract at this address is not what we expect, or the RPC is having issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to make this function fallible and return an error in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes so we log the error so if its the former, someone can do some investigation. I don't think we want to panic on RPC errors. We could retry indefinitely, but then we aren't handling the first case. Using latest finalized as a fallback doesn't seem to terrible to me.
next: Option<Range<u64>>, | ||
} | ||
|
||
impl ChunkGenerator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can replace this with a much shorter function that has the same behaviour, e.g.
fn chunk_range(range: std::ops::Range<usize>, chunk_size: usize) -> Vec<std::ops::Range<usize>> {
range.clone().step_by(chunk_size)
.map(|start| start..(start + chunk_size).min(range.end))
.collect()
}
I'm also fine leaving it as is but if we can have less code for the same functionality that's nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can. It's not exactly the same thing though. Yours needlessly allocates a Vec
. I don't that is a big deal either, but since we already have the code to avoid that, why not use it?
|
||
// deploy the stake_table contract | ||
let stake_table_contract = | ||
contract_bindings_ethers::permissioned_stake_table::PermissionedStakeTable::deploy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use only alloy bindings in new code. Otherwise it increases the maintenance work for the migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I have time to learn a new framework today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be a big change and we have conversion utilities. This PR is already using PermissionedStakeTableInstance
which is from the alloy bindings. But okay, we can deal with it later.
setup_test(); | ||
|
||
let anvil = Anvil::new() | ||
.args(vec!["--block-time", "1", "--slots-in-an-epoch", "1"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if the tests was written such that it would fail if it used non-finalized blocks. Currently I think it wouldn't matter because every block is a finalized block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not following.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make a specific suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in terms of what we should test, apart from basic correctness of the implementation. One problem would be if we fetched any events that are not finalized. I don't think we have a test that asserts that we are not fetching any non-finalized events. This was an issue / unclear requirement with the original PR which I think tells us that we should have such a test.
types/src/v0/impls/l1.rs
Outdated
sleep(Duration::from_secs(1)).await; | ||
} | ||
// Take block_number from the first receipt. Later | ||
// blocks don't appear to become finalized. Need to wait longer? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment confuses me. So this is anvil behaving strangely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually just the last transaction doesn't get inserted into the cache. I'm not sure if it is a bug in the test or the code, but I'm looking into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The event listener in stake_update_loop
never receives an event for the last block. I can trigger it by submitting two transactions later. I think this is how anvil works, but I may be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaces `JoinSet`
And test that is no longer meaningful.
Send extra transactions to trigger finalization in anvil.
Created Issue: #2681 |
sync::{Mutex, Notify}, | ||
task::JoinHandle, | ||
}; | ||
use url::Url; | ||
|
||
use crate::v0::utils::parse_duration; | ||
use crate::{v0::utils::parse_duration, v0_3::StakeTables}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We import from v0_3
in v0_1
. I think here it doesn't really create an issue because it doesn't affect anything we serialize but should we move it around? I think we normally avoid this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we create a new v3::L1State
, we also need a v3::L1Client
. Feels like a rabbit hole. I'll fiddle a bit and see how deep it goes.
I'm question if this PR is doing what we want. If I add a big of logs diff --git a/types/src/v0/impls/l1.rs b/types/src/v0/impls/l1.rs
index f6193bf26c..51beef8e9a 100644
--- a/types/src/v0/impls/l1.rs
+++ b/types/src/v0/impls/l1.rs
@@ -440,6 +440,7 @@ impl L1Client {
let chunks = ChunkGenerator::new(start_block, finalized.number, chunk_size);
let mut events: Vec<StakersUpdated> = Vec::new();
for Range { start, end } in chunks {
+ tracing::debug!("fetching l1 events from {start} to {end}");
match stake_table_contract
.StakersUpdated_filter()
.from_block(start)
@@ -462,6 +463,7 @@ impl L1Client {
{
let st = StakeTables::from_l1_events(events);
let mut state = state.lock().await;
+ tracing::debug!("putting st at block {} {st:?}", finalized.number);
state.put_stake_tables(finalized.number, st)
};
sleep(retry_delay).await;
I get this output from
In order to do this correctly I think we need to keep a cache of events (or better persist them) and then fetch new events, add them to the events we already have and then compute the stake table. Or alternatively implement a function to apply new events to an existing stake table to compute the new stake table. full logs:
|
Closes #2683
This PR:
Asynchronously updates stake table cache by listening for
L1Event::NewFinalized
. This happens in a second update loop thread (added toL1Client.update_tasks
). This task is initialized in first call toget_stake_tables
(which will only be called in POS version). To avoid fetching from l1 block 0, we take the block contract was deployed at as initial block. Subsequent calls will span previous finalized block to that received in theNewFinalized
events.Note that errors retrieving initial block from contract are logged, but they fallback to using the last known finalized block. We might double check this logic to make sure this is acceptable.
Most relevant code is in
types/src/v0/impls/l1.rs
. Tests exist for fetching stake tables from l1 and to ensure stake table cache is being updated from update loop. Therefore, upgrading tasks is tested by implication.There will be some followups for more error handling on the http requests. We should crate issues for this before merge (see comments below).
A generator utility was added as a best of both worlds solution to avoiding some async calls and avoid lifetimes issues behind obtuse
impl Future
types.