-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement systemd-notify directly #26456
base: dev
Are you sure you want to change the base?
Conversation
Here's attempt v2. Most of the issues should've been resolved, and I even tried to write a proper test. Since signalling readiness now also automatically starts the watchdog interval if applicable, I decided to rename the functions to what they mean from the controller's pov rather than what they technically do, so Also, all checks regarding availability of unix sockets, and whether we actually run under a service manager using the sd_notify interface now happen in the sd-notify module, so as much logic as possible is moved out of |
import logger from './logger';
/**
* Handle sd_notify protocol, @see https://www.freedesktop.org/software/systemd/man/latest/sd_notify.html
* No-op if running on unsupported platforms or without Type=notify
*/
export async function initSdNotify(): Promise<{stopping: () => void; stop: () => void} | undefined> {
if (!process.env.NOTIFY_SOCKET) {
return;
}
try {
const {createSocket} = await import('unix-dgram');
const socket = createSocket('unix_dgram');
const sendToSystemd = (msg: string): void => {
const buffer = Buffer.from(msg);
socket.send(buffer, 0, buffer.byteLength, process.env.NOTIFY_SOCKET!, (err) => {
if (err) {
logger.warning(`Failed to send "${msg}" to systemd: ${err.message}`);
}
});
};
const stopping = (): void => sendToSystemd('STOPPING=1');
sendToSystemd('READY=1');
const wdUSec = process.env.WATCHDOG_USEC !== undefined ? Math.max(0, parseInt(process.env.WATCHDOG_USEC, 10)) : -1;
if (wdUSec > 0) {
// Convert us to ms, send twice as frequently as the timeout
const watchdogInterval = setInterval(() => sendToSystemd('WATCHDOG=1'), wdUSec / 1000 / 2);
return {
stopping,
stop: (): void => clearInterval(watchdogInterval),
};
}
if (wdUSec !== -1) {
logger.warning(`WATCHDOG_USEC invalid: "${process.env.WATCHDOG_USEC}", parsed to "${wdUSec}"`);
}
return {
stopping,
stop: (): void => {},
};
} catch (error) {
// Ignore error on Windows if not running on WSL, as UNIX sockets don't exist on Windows.
// Unlikely that NOTIFY_SOCKET is set anyways but better be safe.
if (process.platform !== 'win32' || process.env.WSL_DISTRO_NAME) {
throw error;
}
}
} I'd go with something like this (quick draft/untested). The return is stored in Controller, and functions are called as needed. That should keep only the minimum amount of refs, and completely no-op when unsupported. Note: can use @Koenkk what do you think? |
@Nerivec sounds good! However I'm wondering if all of this code should be directly in Z2M, I think this deserves its own NPM package? |
It's tailored pretty specifically for Z2M though (the whole ready/stop thingy). You'd have to make it more generic for an NPM package, and lose the optimizations. |
@srett I added the changes, and full coverage, but I can't actually test it on a real setup at the moment. Can you check? |
@Nerivec can give it the smoke test in my local setup tomorrow. Regarding the naming of the methods, I opted for past tense since we're calling those two functions when the action in question finished happening, i.e. There is one potential problem with your merge of Generally the idea of this whole sd_notify is:
|
If loading a module fails we have way bigger problems though, I don't think it's much of an issue (even though we technically have to cover the code path for it in this scenario). The previous code had special handling because the imported module hooked into a non-node system that could fail. The new code should only fail if on unsupported system (i.e. |
Dependency on sd-notify removed. sd-notify is wrapping libsystemd, which requires installing libsystemd-dev for building, and loads libsystemd.so at runtime. Since we only need the sd_notify part of the lib, and the protocol is trivial to implement, do so. We also now bail out early in case a watchdog timeout is set and the notification module could not be loaded, as in this case systemd would constantly kill and restart zigbee2mqtt, resulting in unstable operation that might not be obvious at first sight.
It would also fail if you don't have a c++ compiler installed, which coincidentally is also the case in the CI, hence the first version failed during the tests with "unix_dgram.node not found". |
@Koenkk is |
@Nerivec we can try removing it for now, and if it creates problems, only skip it for Windows |
@Nerivec Have been running this for ~20 hours now, all seems good. :) Have you considered making unix-dgram non-optional? It should basically make the case where we run into an exception when trying to load the library go away. But in either case this is already better than before... |
Remove the dependency on sd-notify, which in turn wraps the native libsystemd.so to use the systemd notification mechanism. This also means we have one less build dependency (libsystemd-dev on debian an derivatives).
The protocol is now implemented in TypeScript on top of unix-dgram.
While at it, I also made it so that z2m would exit immediately if loading the sd-notify module fails for whatever reason (unix-dgram unavailable) and z2m is run with a WatchdogTimeout configured, since there is no way to avoid the service getting killed over and over again. The accompanying log message hopefully makes it easier to quickly find the root cause, see #21463 / #26390
Fun fact: While the systemd developers have taken measures to reduce dependencies of libsystemd, this was famously how openssh was backdoored last year, since sshd also used libsystemd for sd_notify, and also loaded liblzma for unrelated reasons, which contained the actual backdoor targeting sshd.