Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gRPC streams for linera storage service. #3346

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

usagi32
Copy link
Contributor

@usagi32 usagi32 commented Feb 16, 2025

addresses #1793

@usagi32
Copy link
Contributor Author

usagi32 commented Feb 16, 2025

@ma2bd

@@ -53,7 +32,7 @@ message RequestContainsKeys {
}

message ReplyContainsKeys {
repeated bool tests = 1;
repeated bool tests = 1 [packed = true];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't packed the default anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its in proto3, it was not in proto2.

Comment on lines 104 to 108
if value.is_none() {
return Ok(None);
}

reformed_value.append(&mut value.unwrap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid the unwrap (and why the &mut?):

Suggested change
if value.is_none() {
return Ok(None);
}
reformed_value.append(&mut value.unwrap());
let Some(value) = value else {
return Ok(None);
};
reformed_value.append(value);

Also, should we log a warning if one of the values is None? That isn't expected behavior, I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Append takes an exclusive reference?
No, its fine. None is actually part of the message optional field. If key is not found None is send and we return from it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd just like to avoid unwrap wherever it's easily possible. if let else works just as well here, without the unwrap.

Right, the append could probably be replaced with extend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd just like to avoid unwrap wherever it's easily possible. if let else works just as well here, without the unwrap.

Actually, was talking about the need to log the None. It's fine and expected.

Right, the append could probably be replaced with extend.

That would be quite bad, extend will iterate move things on the stack for each element, append moves things in bulk in-place on the heap. It will be like O(1) vs O(n), where n is at least 6 orders of magnitude greater. Sometimes the compiler inlines and optimizes it away so both are same, but let's not rely on that.

let mut chunk_size = 0;
for full_key in full_keys {
if MAX_PAYLOAD_SIZE >= full_key.len() + chunk_size {
chunk_size += full_key.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically the chunk size probably also contains the encoded length of the full key, and then the encoded length of grouped_keys?

Not sure if MAX_PAYLOAD_SIZE leaves enough room for that or if we should add them here? (@MathieuDutSik)

(Also below.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, gRPC proto encodes a little more than even the length.
There were two extreme cases

  • keys are closer to MAX_KEY_SIZE, then there is a lot more than enough room in MAX_PAYLOAD_SIZE.
  • keys are small like 10 bytes, then there will be like hundreds of thousands of keys to test before the limit, even in synthetic tests we are not doing that.

Both are unrealistic cases, and there will performance and efficiency tradeoffs. Extra encodings are also variable for each element as non primitive types are not packed together. Its like tag + length + data where tag can be anything from 1 to 5 bytes depending on the field number and same for length.
Just a simple buffer of like 10-20 bytes will dominate the size of very small keys.

Let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard size is 4M for gRPC. But the MAX_PAYLOAD_SIZE if 4000000 which leaves room for additional small stuff to be put.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fine for the most part. Although, it can be broken if one tries intentionally, but I'm sure never in any real-life, practical scenario.
I'll do some few tweaks if I ever feel otherwise.

}
};

let _ = std::thread::spawn(spawner);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really worth doing that in a separate thread? There's no heavy computation going on in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to get to the receiver at the end of functions body to take the advantages of streaming for switching the compute bound and IO bound operations. Preprocessing should be avoided as possible.
I just did that as there was no future to be awaited in the closure.
It just hit me that the receiver will likely be awaited by the tonic, we can just do this like a normal future.


if 0 == num_messages {
values.push(Some(value));
value = Vec::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather declare value and num_messages at the beginning of the while loop's body.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't they declared at beginning only?
Its just reassignment of the binding after moving the value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean after the while, inside the outer loop's body: That would remove the need to reset it. I'd also find it a bit more readable, because it conceptually tracks the progress of the inner loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it needs to be persisted across the while loop. You may refer to the server module.

self.read_entries(message_index, num_chunks).await

let is_zero = |buf: &[u8]| {
let (prefix, aligned, suffix) = unsafe { buf.align_to::<u128>() };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid unsafe.
Why not keep using repeated KeyValue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was going to remove the unsafe, it was for me just during the testing as its just a very fast way to check if slice is all 0s. Code should do fine removing the condition using it.
repeated KeyValue will give unnecessary complexity to the code and will degrade the performance and efficiency.
Best way to approach it serialize the whole thing mimicking bcs, and de-serialize at the other end on-the-fly.

get_random_key_values2(30, 4, 10),
get_random_key_values2(30, 4, 100),
//get_random_key_values2(10, 200, 3_999_700),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that value have a particular significance?
Anyway, let's either remove the line or the //.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large key lengths or number of keys are exponentially slow in the tests. Likely due to the way run_reads is designed.
Just left this for the thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you should remove the // and add a comment to the run_reads function complexity. Indeed it is slow.
A test does not need to be super fast, but it should be testing useful functionality.

The code should not be cryptic and if there is an underlying reason that takes more than 5 minutes to understand then it should be written clearly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, will do. I just wanted this to be known by others, for if they ever feel storage service tests are progressing slow.

@@ -19,7 +19,7 @@ test = ["linera-views/test"]

[[bin]]
name = "linera-storage-server"
path = "src/server.rs"
path = "src/main.rs"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was quite handy for me during the testing. We get a server library function we can spawn any time to perform tests using #[tokio::test] without building everything.

@afck afck requested a review from MathieuDutSik February 19, 2025 11:08
@usagi32 usagi32 requested a review from afck February 22, 2025 11:21
Copy link
Contributor

@MathieuDutSik MathieuDutSik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a nice progress but a little cryptic.

Comment on lines +56 to +58
// Another way is to cast &self to raw pointers and use unsafe rust
// that also will completely be ok as tasks are either scoped, joining
// themselves at their functions end or else even then the raw pointers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not want to use unsafe code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's why chose the safe approach. This comment was just to let the others know.

get_random_key_values2(30, 4, 10),
get_random_key_values2(30, 4, 100),
//get_random_key_values2(10, 200, 3_999_700),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you should remove the // and add a comment to the run_reads function complexity. Indeed it is slow.
A test does not need to be super fast, but it should be testing useful functionality.

The code should not be cryptic and if there is an underlying reason that takes more than 5 minutes to understand then it should be written clearly.

let mut chunk_size = 0;
for full_key in full_keys {
if MAX_PAYLOAD_SIZE >= full_key.len() + chunk_size {
chunk_size += full_key.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard size is 4M for gRPC. But the MAX_PAYLOAD_SIZE if 4000000 which leaves room for additional small stuff to be put.

let mut message = VecDeque::from(message);
while !message.is_empty() || state.get(2).is_some_and(|len| 0 == *len) {
match state.len() {
0 => {
Copy link
Contributor

@MathieuDutSik MathieuDutSik Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the code 0, 1, 2 and 3 are used for coding some meaning we cannot use some enum.
But it would be preferable to put some comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be nice.

Comment on lines +653 to +655
let _ = state.pop();
let _ = state.pop();
let _ = state.pop();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little unfortunate to have the 3 pops. But the alternativer with split_off or drain is not much simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My purpose was to leave better readability and understanding of what's going, to help others visualize. Since the size of the state vector here is 3, vec.clear could have been used, but it would have left others finding themselves a little wondering and with some assumptions with manual state tracking. It's better now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants