-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCTI fails with “Resource temporarily unavailable” (EAGAIN) errors #2949
Comments
Probably a somewhat better workaround that invokes diff --git a/src/util-io/io.c b/src/util-io/io.c
index 1dc1cdc7..68bd933e 100644
--- a/src/util-io/io.c
+++ b/src/util-io/io.c
@@ -51,7 +51,8 @@ size_t
read_all (
SOCKET fd,
uint8_t *data,
- size_t size)
+ size_t size,
+ int timeout)
{
ssize_t recvd;
size_t recvd_total = 0;
@@ -69,6 +70,11 @@ read_all (
#else
TEMP_RETRY (recvd, read (fd, &data [recvd_total], size));
if (recvd < 0) {
+ if (errno == EAGAIN || errno == EWOULDBLOCK) {
+ if (socket_poll(fd, SOCKET_POLL_RD, timeout) == TSS2_RC_SUCCESS) {
+ continue;
+ }
+ }
LOG_WARNING ("read on fd %d failed with errno %d: %s",
fd, errno, strerror (errno));
return recvd_total;
@@ -91,7 +97,8 @@ size_t
write_all (
SOCKET fd,
const uint8_t *buf,
- size_t size)
+ size_t size,
+ int timeout)
{
ssize_t written = 0;
size_t written_total = 0;
@@ -117,6 +124,11 @@ write_all (
#ifdef _WIN32
LOG_ERROR ("failed to write to fd %d: %s", fd, strerror (WSAGetLastError()));
#else
+ if (errno == EAGAIN || errno == EWOULDBLOCK) {
+ if (socket_poll(fd, SOCKET_POLL_WR, timeout) == TSS2_RC_SUCCESS) {
+ continue;
+ }
+ }
LOG_ERROR ("failed to write to fd %d: %s", fd, strerror (errno));
#endif
return written_total;
@@ -130,21 +142,23 @@ size_t
socket_recv_buf (
SOCKET sock,
uint8_t *data,
- size_t size)
+ size_t size,
+ int timeout)
{
- return read_all (sock, data, size);
+ return read_all (sock, data, size, timeout);
}
TSS2_RC
socket_xmit_buf (
SOCKET sock,
const void *buf,
- size_t size)
+ size_t size,
+ int timeout)
{
size_t ret;
LOGBLOB_DEBUG (buf, size, "Writing %zu bytes to socket %d:", size, sock);
- ret = write_all (sock, buf, size);
+ ret = write_all (sock, buf, size, timeout);
if (ret < size) {
#ifdef _WIN32
LOG_ERROR ("write to fd %d failed, errno %d: %s", sock, WSAGetLastError(), strerror (WSAGetLastError()));
@@ -333,14 +347,23 @@ socket_set_nonblock (SOCKET sock)
}
TSS2_RC
-socket_poll (SOCKET sock, int timeout)
+socket_poll (SOCKET sock, int wait_flags, int timeout)
{
#ifndef _WIN32
struct pollfd fds;
int rc_poll, nfds = 1;
fds.fd = sock;
- fds.events = POLLIN;
+ fds.revents = fds.events = 0;
+
+ if (wait_flags & SOCKET_POLL_RD)
+ fds.events |= POLLIN;
+ if (wait_flags & SOCKET_POLL_WR)
+ fds.events |= POLLOUT;
+
+ if (!fds.events) {
+ return TSS2_TCTI_RC_BAD_VALUE;
+ }
/* Timeout of 0 ie return immediately is not
* well handled throughout the upper layers currenty Please see full patch here: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We have observed the following problem:
When using the
mssim
TCTI, “Resource temporarily unavailable” errors occur regularly. Often 2 out of 3 runs will fail!For example, it looks like this:
Note that “Resource temporarily unavailable” comes down to an
EAGAIN
error (i.e. errno 11).I think the reason why this can happen is the way how
tcti_mssim_receive()
is currently implemented: It will firstpoll()
the network socket until it becomes "ready for reading", and once this has happened, it will attempt torecv()
the full response message. This is actually wrapped in thesocket_recv_buf()
function, which just calls theread_all()
function.There are, to my understanding, at least two ways how this can go wrong:
If
poll()
signals that the network socket is "ready for reading", it means that some bytes can be read now, but it does not guarantee that the full message is available yet. Nonetheless, the subsequentread_all()
always attempts to read the full message, by repeatedly callingrecv()
. This will fail, if the full message cannot be read right now. Specifically, theread_all()
function will fail with anEAGAIN
error (instead of blocking and waiting), if insufficient data is available at the moment – because the socket was opened inO_NONBLOCK
mode. And that is, I suppose, precisely what we are seeing.At least on the Linux platform, the
poll()
andselect()
functions may cause a so-called "spurious readiness notification". This means that a socket may be reported as "ready for reading" but then the subsequentread()
may still block because the socket is not actually ready. InO_NONBLOCK
mode,recv()
orread()
will fail withEAGAIN
in this situation.For reference, please see the "BUGS" sections at:
At the core of the problem is that the
TEMP_RETRY
macro does not currently handle theEAGAIN
(andEWOULDBLOCK
) errors.At least on the Linux platform. It appears there is some handling on FreeBSD already 🤔
The following patch contains a simple workaround that has fixed the “Resource temporarily unavailable” problem for us:
I think the preferable solution would be going back to polling when it turns out that no or insufficient data is available for reading, while keeping the partial message that has already been read. But that would probably require some more significant changes.
Regards.
The text was updated successfully, but these errors were encountered: