Two meson test improvements for running integration tests #14513

DaanDeMeyer · 2025-04-24T07:59:02Z

No description provided.

mesonbuild/mtest.py

bonzini · 2025-04-24T08:05:24Z

mesonbuild/mtest.py


                # Make sure the termination signal actually kills the process
                # group, otherwise retry with a SIGKILL.
                with suppress(asyncio.TimeoutError):
-                    await asyncio.wait_for(p.wait(), timeout=0.5)
+                    await asyncio.wait_for(p.wait(), timeout=60)


0.5 could be too little but can you test what happens if a test does not respond (which can happen, for example, if the test sets a flag on SIGTERM but is stuck so that it never goes back to the main event loop)? Having to wait 60 seconds after Ctrl-C would be a bad experience.

Yes it would be a bad experience but unlike having a timeout that's too small, this can be fixed in the test by making sure it doesn't ignore SIGTERM. I think we should err on the side of a larger timeout, since a timeout that's too small cannot be dealt with by users, whereas a timeout that's too large can be dealt with by users by fixing the test to not ignore SIGTERM.

I am not talking about a test ignoring SIGTERM. The most common way to handle SIGTERM is to set a flag and handle the signal in the main event loop, but that won't happen if the task is stuck in an infinite loop. Maybe you could implement it so that when meson test caught a SIGINT it uses the 0.5 seconds timeout, whereas SIGTERM uses a longer one?

With respect to sending SIGTERM to the parent only... it's complicated. You're sending SIGTERM to the parent to allow the parent process to orchestrate an orderly shutdown—including cleanup that might require communication with children, and it can prevent race conditions where children terminate before the parent can handle necessary cleanup.

However I'm worried that actual code in the wild may not properly handle SIGTERM or child process cleanup, which would lead to orphaned processes in applications that spawn children but don't maintain proper tracking of them. Those children would not get a SIGKILL, they would be reparented to init instead.

I guess we could try that and see if we get complaints, but also I would need to check the history more closely, because I seem to remember that the process group kill was added to fix a report. In principle I'd say that it is a bug for a process to not trap SIGTERM if it spawns processes, and therefore it's okay for the user to send SIGTERM to the process rather than the group; but in practice sending SIGTERM to the group is more robust against implementation bugs and the downside (potentially interrupting cleanup) is usually less severe than the downside of orphaned processes.

I am not talking about a test ignoring SIGTERM. The most common way to handle SIGTERM is to set a flag and handle the signal in the main event loop, but that won't happen if the task is stuck in an infinite loop. Maybe you could implement it so that when meson test caught a SIGINT it uses the 0.5 seconds timeout, whereas SIGTERM uses a longer one?

For daemons sure, but for tests? I'm pretty sure tests are mostly synchronous and generally won't handle SIGTERM at all.

Maybe you could implement it so that when meson test caught a SIGINT it uses the 0.5 seconds timeout, whereas SIGTERM uses a longer one?

I'm pretty sure I want the 60s timeout for both. If a test gets stuck and I use ctrl+c to interrupt it, I still want the logs to be saved and analyzed to figure out to help figure out why the test got stuck in the first place.

I dropped the process group change.

I'm pretty sure tests are mostly synchronous and generally won't handle SIGTERM at all.

It depends. Remember that this does not apply to the test executables only, but also to other processes they spawn. For simple unit tests that may be true; but there can be more complicated tests that talk to the project's executables, and those binaries under test might very well be handling SIGTERM.

Maybe you could implement it so that when meson test caught a SIGINT it uses the 0.5 seconds timeout, whereas SIGTERM uses a longer one?

I'm pretty sure I want the 60s timeout for both.

60s is definitely too much for interactive use, especially for the "immediate exit" case (triggered in meson test by pressing Ctrl-C three times in succession). What about adding a new option --sigterm-timeout?

Sure I wouldn't mind a new option, I'll do that

meson will send SIGTERM if the test gets stuck and hits the timeout, in which case we still want to do log saving and analysis, so let's add some signal handlers which allow us to do that. This won't be very useful until mesonbuild/meson#14513 lands, since we only get half a second from meson to handle SIGTERM before it sends SIGKILL, but let's land this already so we immediately start taking advantage of the meson fix once it lands.

When using meson to run integration tests and an integration test times out, we still want the opportunity to analyze and save the logs from the integration test so we can figure out why the test got stuck. Currently, meson only gives test processes half a second to handle SIGTERM before it sends SIGKILL, which is not sufficient time to analyze all the logs and do all other cleanup. Let's bump the timeout after sending SIGTERM to 60s, so that test processes that wish to do cleanup get some time to do so. This only has an effect if the test process installs a custom SIGTERM handler. If the test process doesn't do so, it'll get exited immediately like before.

meson will send SIGTERM if the test gets stuck and hits the timeout, in which case we still want to do log saving and analysis, so let's add some signal handlers which allow us to do that. This won't be very useful until mesonbuild/meson#14513 lands, since we only get half a second from meson to handle SIGTERM before it sends SIGKILL, but let's land this already so we immediately start taking advantage of the meson fix once it lands.

bonzini · 2025-05-12T16:20:54Z

Looks good to me, thanks. If you want you can add release notes in docs/markdown/snippets too.

DaanDeMeyer mentioned this pull request Apr 24, 2025

test/mkosi: kill vm or container slightly earlier than timeout to get journal systemd/systemd#37232

Open

bonzini added this to the 1.9 milestone Apr 24, 2025

bonzini added the test targets label Apr 24, 2025

bonzini suggested changes Apr 24, 2025

View reviewed changes

DaanDeMeyer force-pushed the test-fixes branch from cb92d0f to 8759f6c Compare April 24, 2025 08:13

DaanDeMeyer mentioned this pull request Apr 24, 2025

test: Add custom signal handlers to integration test wrapper script systemd/systemd#37234

Merged

DaanDeMeyer force-pushed the test-fixes branch from 8759f6c to f62674c Compare April 24, 2025 10:39

DaanDeMeyer force-pushed the test-fixes branch from f62674c to 84b4112 Compare April 24, 2025 12:53

DaanDeMeyer requested a review from jpakkane as a code owner April 24, 2025 12:53

bonzini approved these changes May 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two meson test improvements for running integration tests #14513

Two meson test improvements for running integration tests #14513

DaanDeMeyer commented Apr 24, 2025

bonzini Apr 24, 2025

DaanDeMeyer Apr 24, 2025

bonzini Apr 24, 2025 •

edited

Loading

DaanDeMeyer Apr 24, 2025

DaanDeMeyer Apr 24, 2025

bonzini Apr 24, 2025

DaanDeMeyer Apr 24, 2025

bonzini commented May 12, 2025

Two meson test improvements for running integration tests #14513

Are you sure you want to change the base?

Two meson test improvements for running integration tests #14513

Conversation

DaanDeMeyer commented Apr 24, 2025

bonzini Apr 24, 2025

Choose a reason for hiding this comment

DaanDeMeyer Apr 24, 2025

Choose a reason for hiding this comment

bonzini Apr 24, 2025 • edited Loading

Choose a reason for hiding this comment

DaanDeMeyer Apr 24, 2025

Choose a reason for hiding this comment

DaanDeMeyer Apr 24, 2025

Choose a reason for hiding this comment

bonzini Apr 24, 2025

Choose a reason for hiding this comment

DaanDeMeyer Apr 24, 2025

Choose a reason for hiding this comment

bonzini commented May 12, 2025

bonzini Apr 24, 2025 •

edited

Loading