Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCTI fails with “Resource temporarily unavailable” (EAGAIN) errors #2949

Open
danieltrick opened this issue Feb 20, 2025 · 1 comment
Open

Comments

@danieltrick
Copy link

danieltrick commented Feb 20, 2025

We have observed the following problem:

When using the mssim TCTI, “Resource temporarily unavailable” errors occur regularly. Often 2 out of 3 runs will fail!

For example, it looks like this:

tpm2_load -T 'mssim:host=192.168.178.47,port=2323' -C 0x81000006 -P 12345 -u ecc.pub -r ecc.priv -c ecc.ctx
Error message: WARNING:tcti:src/util-io/io.c:66:read_all() read on fd 3 failed with errno 11: Resource temporarily unavailable
ERROR:esys:src/tss2-esys/api/Esys_ContextSave.c:251:Esys_ContextSave_Finish() Received a non-TPM Error
ERROR:esys:src/tss2-esys/api/Esys_ContextSave.c:92:Esys_ContextSave() Esys Finish ErrorCode (0x000a000a)
ERROR: Esys_ContextSave(0xA000A) - tcti:IO failure

Note that “Resource temporarily unavailable” comes down to an EAGAIN error (i.e. errno 11).


I think the reason why this can happen is the way how tcti_mssim_receive() is currently implemented: It will first poll() the network socket until it becomes "ready for reading", and once this has happened, it will attempt to recv() the full response message. This is actually wrapped in the socket_recv_buf() function, which just calls the read_all() function.

There are, to my understanding, at least two ways how this can go wrong:

  • If poll() signals that the network socket is "ready for reading", it means that some bytes can be read now, but it does not guarantee that the full message is available yet. Nonetheless, the subsequent read_all() always attempts to read the full message, by repeatedly calling recv(). This will fail, if the full message cannot be read right now. Specifically, the read_all() function will fail with an EAGAIN error (instead of blocking and waiting), if insufficient data is available at the moment – because the socket was opened in O_NONBLOCK mode. And that is, I suppose, precisely what we are seeing.

  • At least on the Linux platform, the poll() and select() functions may cause a so-called "spurious readiness notification". This means that a socket may be reported as "ready for reading" but then the subsequent read() may still block because the socket is not actually ready. In O_NONBLOCK mode, recv() or read() will fail with EAGAIN in this situation.

    For reference, please see the "BUGS" sections at:


At the core of the problem is that the TEMP_RETRY macro does not currently handle the EAGAIN (and EWOULDBLOCK) errors.

At least on the Linux platform. It appears there is some handling on FreeBSD already 🤔

The following patch contains a simple workaround that has fixed the “Resource temporarily unavailable” problem for us:

diff --git a/src/util-io/io.h b/src/util-io/io.h
index 595177d3..dc9a35fa 100644
--- a/src/util-io/io.h
+++ b/src/util-io/io.h
@@ -44,11 +44,12 @@ typedef SSIZE_T ssize_t;
     dest =__ret; }
 #else
 #define TEMP_RETRY(dest, exp) \
-{   int __ret; \
+{   int __ret, __err = 0; \
     do { \
+        if (__err > 0) usleep(100U); \
         __ret = exp; \
-    } while (__ret == SOCKET_ERROR && errno == EINTR); \
-    ((dest)) =__ret; }
+    } while ((__ret == SOCKET_ERROR) && (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK) && (++__err < 32767)); \
+    ((dest)) = __ret; }
 #endif
 
 #ifdef __cplusplus

I think the preferable solution would be going back to polling when it turns out that no or insufficient data is available for reading, while keeping the partial message that has already been read. But that would probably require some more significant changes.

Regards.

@danieltrick
Copy link
Author

danieltrick commented Feb 21, 2025

Probably a somewhat better workaround that invokes poll() when an EAGAIN error was encountered:

diff --git a/src/util-io/io.c b/src/util-io/io.c
index 1dc1cdc7..68bd933e 100644
--- a/src/util-io/io.c
+++ b/src/util-io/io.c
@@ -51,7 +51,8 @@ size_t
 read_all (
     SOCKET fd,
     uint8_t *data,
-    size_t size)
+    size_t size,
+    int timeout)
 {
     ssize_t recvd;
     size_t recvd_total = 0;
@@ -69,6 +70,11 @@ read_all (
 #else
         TEMP_RETRY (recvd, read (fd, &data [recvd_total], size));
         if (recvd < 0) {
+            if (errno == EAGAIN || errno == EWOULDBLOCK) {
+                if (socket_poll(fd, SOCKET_POLL_RD, timeout) == TSS2_RC_SUCCESS) {
+                    continue;
+                }
+            }
             LOG_WARNING ("read on fd %d failed with errno %d: %s",
                          fd, errno, strerror (errno));
             return recvd_total;
@@ -91,7 +97,8 @@ size_t
 write_all (
     SOCKET fd,
     const uint8_t *buf,
-    size_t size)
+    size_t size,
+    int timeout)
 {
     ssize_t written = 0;
     size_t written_total = 0;
@@ -117,6 +124,11 @@ write_all (
 #ifdef _WIN32
             LOG_ERROR ("failed to write to fd %d: %s", fd, strerror (WSAGetLastError()));
 #else
+            if (errno == EAGAIN || errno == EWOULDBLOCK) {
+                if (socket_poll(fd, SOCKET_POLL_WR, timeout) == TSS2_RC_SUCCESS) {
+                    continue;
+                }
+            }
             LOG_ERROR ("failed to write to fd %d: %s", fd, strerror (errno));
 #endif
             return written_total;
@@ -130,21 +142,23 @@ size_t
 socket_recv_buf (
     SOCKET sock,
     uint8_t *data,
-    size_t size)
+    size_t size,
+    int timeout)
 {
-    return read_all (sock, data, size);
+    return read_all (sock, data, size, timeout);
 }
 
 TSS2_RC
 socket_xmit_buf (
     SOCKET sock,
     const void *buf,
-    size_t size)
+    size_t size,
+    int timeout)
 {
     size_t ret;
 
     LOGBLOB_DEBUG (buf, size, "Writing %zu bytes to socket %d:", size, sock);
-    ret = write_all (sock, buf, size);
+    ret = write_all (sock, buf, size, timeout);
     if (ret < size) {
 #ifdef _WIN32
         LOG_ERROR ("write to fd %d failed, errno %d: %s", sock, WSAGetLastError(), strerror (WSAGetLastError()));
@@ -333,14 +347,23 @@ socket_set_nonblock (SOCKET sock)
 }
 
 TSS2_RC
-socket_poll (SOCKET sock, int timeout)
+socket_poll (SOCKET sock, int wait_flags, int timeout)
 {
 #ifndef _WIN32
     struct pollfd fds;
     int rc_poll, nfds = 1;
 
     fds.fd = sock;
-    fds.events = POLLIN;
+    fds.revents = fds.events = 0;
+
+    if (wait_flags & SOCKET_POLL_RD)
+        fds.events |= POLLIN;
+    if (wait_flags & SOCKET_POLL_WR)
+        fds.events |= POLLOUT;
+
+    if (!fds.events) {
+        return TSS2_TCTI_RC_BAD_VALUE;
+    }
 
     /* Timeout of 0 ie return immediately is not
      * well handled throughout the upper layers currenty

Please see full patch here:
libtss2-tcti-mssim-patch-v6.patch
libtss2-tcti-mssim-patch-v6.tar.gz

@danieltrick danieltrick changed the title TCTI often fails with “Resource temporarily unavailable” (EAGAIN) errors TCTI fails with “Resource temporarily unavailable” (EAGAIN) errors Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant