In 6.2, Cluster Metric API calls to other nodes time out after 1s ignoring config #22595

tellistone · 2025-05-15T08:22:30Z

A community post here raises the following:

Graylog 6.2.0 cluster with nodes distributed across multiple geographical regions. When I log in to a node in one region, unable to view performance metrics (e.g., Memory/Heap, Buffers, Journal) for nodes in other regions. However, if I log in to a node in the same geographic location, all metrics display correctly.

In the logs seeing inter-node API timeouts like this:

2025-04-29T18:30:10.070Z WARN  [ProxiedResource] Failed to call API on node <1ce8335f-e3a9-4d66-b1eb-4bdcdedf827b>, cause: timeout (duration: 1002 ms)
2025-04-29T18:30:10.070Z WARN  [ProxiedResource] Failed to call API on node <09ac9ab1-ec2c-4e54-88a0-1c74f0291dfe>, cause: timeout (duration: 1001 ms)
2025-04-29T18:30:10.070Z WARN  [ProxiedResource] Failed to call API on node <a1e3a52c-7d66-4514-afe0-3e4d7afb5e68>, cause: timeout (duration: 1001 ms)
2025-04-29T18:30:10.070Z WARN  [ProxiedResource] Failed to call API on node <1db2c9da-8c1c-4f24-9b6e-b85f49fadd94>, cause: timeout (duration: 1002 ms)
2025-04-29T18:31:10.071Z WARN  [ProxiedResource] Failed to call API on node <c81809a0-b020-489f-892c-15211dd73696>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:10.071Z WARN  [ProxiedResource] Failed to call API on node <1ce8335f-e3a9-4d66-b1eb-4bdcdedf827b>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:10.071Z WARN  [ProxiedResource] Failed to call API on node <1db2c9da-8c1c-4f24-9b6e-b85f49fadd94>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:10.071Z WARN  [ProxiedResource] Failed to call API on node <09ac9ab1-ec2c-4e54-88a0-1c74f0291dfe>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:10.071Z WARN  [ProxiedResource] Failed to call API on node <a1e3a52c-7d66-4514-afe0-3e4d7afb5e68>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:10.071Z WARN  [ProxiedResource] Failed to call API on node <7569ace1-6ea9-4f67-8531-da9c0919ee3d>, cause: timeout (duration: 1002 ms)
2025-04-29T18:31:58.277Z WARN  [ProxiedResource] Failed to call API on node <a1e3a52c-7d66-4514-afe0-3e4d7afb5e68>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:58.277Z WARN  [ProxiedResource] Failed to call API on node <c81809a0-b020-489f-892c-15211dd73696>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:58.277Z WARN  [ProxiedResource] Failed to call API on node <1ce8335f-e3a9-4d66-b1eb-4bdcdedf827b>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:58.277Z WARN  [ProxiedResource] Failed to call API on node <1db2c9da-8c1c-4f24-9b6e-b85f49fadd94>, cause: timeout (duration: 1001 ms)

tried uncommenting and setting proxied_requests_default_call_timeout = 5s in server.conf, but it doesn’t seem to have any effect—the timeouts still occur at around 1 second. also reviewed the config file but couldn’t find any other relevant settings to adjust this timeout.

Everything else in the cluster appears to be functioning properly. This issue is specifically with viewing node performance stats across regions.

Environment Details:

OS: Ubuntu 24.04
Graylog Version: 6.2.0
Number of Graylog nodes: 9 (will be scaling to 20+)
OpenSearch Version: 2.15.0
Number of OpenSearch data nodes: 32

I’ve noticed that if you visit a node’s metrics page e.g., https://10.X.X.X:9000/system/metrics/node/, it fails to load for any node located in a different geographic region than the one you’re currently accessing. This doesn’t appear to be a firewall or DNS issue—I’ve verified both are working correctly.

It’s also worth noting that this behavior was not present in Graylog 6.1.11. In that version, the cluster page displayed journal and JVM heap metrics for all nodes, which relied on successful metric queries across the cluster. It seems that something changed in Graylog 6.2 that affects this functionality.

Is there a new or alternative setting in Graylog 6.2.0 to increase the inter-node API call timeout, or is this a regression?

The text was updated successfully, but these errors were encountered:

tellistone added bug triaged labels May 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

In 6.2, Cluster Metric API calls to other nodes time out after 1s ignoring config #22595

In 6.2, Cluster Metric API calls to other nodes time out after 1s ignoring config #22595

tellistone commented May 15, 2025

In 6.2, Cluster Metric API calls to other nodes time out after 1s ignoring config #22595

In 6.2, Cluster Metric API calls to other nodes time out after 1s ignoring config #22595

Comments

tellistone commented May 15, 2025