-
Notifications
You must be signed in to change notification settings - Fork 346
Split up IPC messages #3868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Split up IPC messages #3868
Conversation
} | ||
|
||
free(part_text); | ||
|
||
if (flags & crm_ipc_server_free) { | ||
pcmk_free_ipc_event(iov); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found this patch useful for debugging purposes, but we don't necessarily have to keep it. One annoyance is that our logging cuts off after a certain number of characters, so very long XML messages won't get logged in their entirety anyway. Still, it's usually enough to see just the beginning to know what's going on.
The
These test failures appear to be somewhat transient. I can't reliably reproduce them on every test every time, but they will happen fairly frequently. It's difficult to tell from looking at the cts-lab logs which failures represent real errors and which are merely due to the lab killing things and restarting VMs. |
This was fixed by the following:
|
ce2a405
to
15e03aa
Compare
Rebased onto main to reflect that I merged the compression PR, no other code changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VERY partial. Just getting these early comments out while I continue reviewing.
lib/common/ipc_client.c
Outdated
g_byte_array_free(client->buffer, TRUE); | ||
} | ||
|
||
client->buffer = g_byte_array_sized_new(client->buf_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second, because there's no way to clear out a GByteArray and inserting
new bytes directly at the beginning, we just need to allocate and free
it every time we do a read.
Are we sure about that? Looks like setting the size to 0 has the effect of truncating the array. (I checked the implementation to confirm.)
https://docs.gtk.org/glib/type_func.ByteArray.set_size.html
|
||
client->buffer[0] = 0; | ||
client->msg_size = qb_ipcc_event_recv(client->ipc, client->buffer, | ||
buffer = pcmk__assert_alloc(crm_ipc_default_buffer_size(), sizeof(char)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make buffer
a guint8 *
and allocate it with g_malloc0()
(the GLib equivalent of calloc()
with assertion on memory error). Libqb expects a void *
, so it doesn't care about the type of buffer
.
Then instead of calling g_byte_array_append()
, we call g_byte_array_new_take(buffer, client->msg_size)
, and skip freeing buffer
on success. This avoids the copy by transferring ownership of the existing buffer to the byte array.
Now, this only works when creating the byte array. As of this commit, we're only appending to it once unless we truncate it first. If we append more data later without truncating (as I expect we will do), then there's no way for the byte array to take ownership of the data in the buffer in those later appends. Though we could either reuse the fixed-size buffer without freeing it, or do some realloc trickery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thought: I'm not especially concerned about dynamic memory allocation, but we could avoid it here by using an internal macro constant for buffer size instead of calling crm_ipc_default_buffer_size()
in these places. Then we could just have a static array for the buffer.
Up to you. I see the appeal of wanting to use that function consistently even internally, but the buffer size is always the same now.
lib/common/ipc_server.c
Outdated
qb_rc = qb_ipcs_event_sendv(c->ipcs, event, 2); | ||
do { | ||
qb_rc = qb_ipcs_event_sendv(c->ipcs, event, 2); | ||
} while (qb_rc == -EAGAIN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://projects.clusterlabs.org/T322 is relevant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another partial review
This has the built-in advantage of various library functions that can be used to manipulate it instead of using pointer arithmetic. For the moment, this isn't important but it will become important later when we're doing split up IPC messages. On the other hand, there's a couple other changes here that are not advantages. First, instead of reading directly into the GByteArray, we now have to allocate a buffer to read into, and then append that to the GByteArray. This also means we need to be careful about freeing that buffer. Second, because there's no way to clear out a GByteArray and inserting new bytes directly at the beginning, we just need to allocate and free it every time we do a read. This also means we need to be more careful about checking if it's NULL. This will be less of a problem later when we are reading into the same GByteArray multiple times for a split up message.
...instead of an xmlNode. The idea here is that to deal with split up messages, this function can be called repeatedly from a loop. Each call will prepare the next chunk of the message to be sent. The first step in doing that is simply changing the type of the parameter. This does result in a little code duplication in the callers, but there are only a couple.
Functions should return this error code if a single IPC message is too large to fit into a single buffer, and so it has to be split up into multiple. It can also be used when one chunk of a split up message has been read, indicating that the caller needs to continue reading.
The function now only prepares as much for one I/O vector as will fit in a single buffer. The idea is that you can call the function repeatedly in a loop, preparing and transmitting individual chunks of a single XML message. As long as pcmk__ipc_prepare_iov returns pcmk_rc_ipc_more, you know there's more that needs to be prepared. Keep a running count of the bytes it prepared and pass that in for offset. Note that clients are not doing this yet, so very large IPC messages are still going to fail.
This does technically add flags to a public enum, but on the other hand they are added after the "these are for pacemaker's internal use only" comment, and we are hoping to eventually make this entire enum private.
We could do this in the callers, but this function has everything it needs to decide which flags should be set.
…ges. If the IPC message is too large to fit in a single buffer, this function now knows how to loop over the message and send it one chunk at a time. This handles the sending side for servers that use the newer style IPC code. This also takes care of sending IPC events from a server.
The libqb read buffer is now a fixed size, so we can just use crm_ipc_default_buffer_size() anywhere we need that. And then anywhere we need to know the size of buffer (nowhere at the moment, but possibly in the future) we can just use GByteArray->len to figure that out. So there's no need to keep an extra struct member around anymore.
This function can be called in a loop on the receive side of new-style IPC code to build up a complete IPC message from individual parts.
We create a new fixed-size temporary buffer inside crm_ipc_send and receive from libqb into that in order to keep changes to a minimum. Then, we add the contents of that temporary buffer to the client's IPC buffer which is allowed to grow as more of the message is received. Also note that crm_ipc_send has an extra receive block at the top for reading and discarding replies that previously timed out. This block also has to be modified to handle multipart messages, but since we are just throwing those away, there's not much to do.
This is similar to internal_ipc_get_reply, except it deals with IPC events and doesn't use timeouts.
We need this for the same reason that there's a similar member in crm_ipc_s - we need some place to store a potentially split up IPC message while we are reading it piece-by-piece.
...at least, when it's sending a server event. In this case, the iov will be added to the send queue and EAGAIN will be returned. We will attempt to send the event again the next time pcmk__ipc_send_iov is called, or when the timer pops to flush the event queue. Thus, we are already handling the EAGAIN return code properly elsewhere and just need to proceed as if everything will be fine.
header->size doesn't mean the same thing anymore now that we have split up IPC messages. If pcmk__client_data2xml is given an IPC message that did not have to be split up, then the size member will be the same as the size of the entire message. However, if it was given an IPC message that had to be split up for transmit and then reassembled, the size member only refers to the size of the last chunk received. Thus, checking for a NULL character in the position given by size will have us checking somewhere in the middle of the reassembled string which is obviously wrong.
This is basically how every daemon is going to work. The daemon needs to delay calling pcmk__client_data2xml until it has the complete message. The hard work of reassembling the message and determining whether it's a multipart message in the first place is already done elsewhere.
I've updated this in various ways that make it worth looking at from the beginning again:
I have one concern here that comes up occassionally in testing. I've been setting the buffer size to something small, like 16k, to ensure that messages get split up so I can test all the splitting/combining code. This results in a lot more IPC traffic, which results in a lot more EAGAIN return codes, and also sometimes means things don't work - for instance, remote nodes sometimes get hung when you try to run On the one hand, this is kind of an invented error. The buffer size is never going to be that small. I've created this problem by making it easier to test. On the other hand, I'm concerned that with a very large cluster, we're basically going to see the same problem where there's a lot of IPC traffic, but just with large messages instead of small ones. |
There are currently cts-lab failures with the Reattach and Remote* tests caused by this PR that I am still in the middle of tracking down.