Skip to content

feat: add commit time metrics #5380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 7, 2025

Conversation

0x009922
Copy link
Contributor

@0x009922 0x009922 commented Mar 24, 2025

Changes:

  • Measure last block commit time as the time since block creation until it is finally committed by the peer.
  • Extend Prometheus metrics with last_commit_time_ms gauge and commit_time_ms histogram (to observe trends)
  • Extend Status with commit_time_ms (same as Prometheus' last_commit_time_ms)
  • Restructured telemetry in iroha_core to clarify the data flow, introduced an actor. Also, finally made the feature-gate #[cfg(feature = "telemetry")] complete.

Additionally:

  • Refined buckets for tx_amounts histogram. It used default buckets1 which aren't suitable for amounts. I changed it to be in the ranges $$(-\infty, -10^{10}), [-10^{10}, -10^{8}), ..., [10^{8}, 10^{10}), [10^{10}, \infty)$$
  • Enhanced iroha_test_network as mentioned in Improve test framework NetworkPeer::start function #5382 + added tracing logging for better tracking of time and scopes.

Closes #5366
Closes #5388
Closes #5382
Closes #5404

API Changes

  • Status (from GET /status) contains a new field: commit_time_ms (compact u64)

TODO

Footnotes

  1. DEFAULT_BUCKETS: "The default buckets are tailored to broadly measure the response time (in seconds) of a network service. Most likely, however, you will be required to define buckets customized to your use case"

@0x009922 0x009922 added the api-changes Changes in the API for client libraries label Mar 24, 2025
@0x009922 0x009922 self-assigned this Mar 24, 2025
@0x009922 0x009922 force-pushed the commit-time-metrics branch from fdee4a9 to efdce1a Compare March 24, 2025 05:37
aoyako
aoyako previously approved these changes Apr 16, 2025
Copy link
Contributor

@dima74 dima74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some nits from clippy

@0x009922
Copy link
Contributor Author

Docker CI fails 😞

Opened #5398

aoyako
aoyako previously approved these changes Apr 18, 2025
@0x009922
Copy link
Contributor Author

I noticed that tests sometimes fail because of the block stream disconnect. Some peers fail to send a message into the WebSocket within the timeout (10s) and close the stream. The test network then doesn't reconnect, and tests fail due to never reaching expected block height (which is detected by the block stream).

I don't know why the SendTimeout happens (

)

A solution could be to introduce a reconnection mechanism in the test network. Or increasing the timeout in Iroha.

@0x009922 0x009922 enabled auto-merge (squash) April 28, 2025 07:06
dima74
dima74 previously approved these changes Apr 29, 2025
@0x009922 0x009922 dismissed stale reviews from dima74 and aoyako via 7e23dab April 30, 2025 01:49
@0x009922 0x009922 force-pushed the commit-time-metrics branch from 7e23dab to 26fcb36 Compare April 30, 2025 02:44
@0x009922 0x009922 requested review from aoyako, dima74 and s8sato April 30, 2025 02:49
@0x009922 0x009922 force-pushed the commit-time-metrics branch from adf642f to 9e62d5f Compare May 2, 2025 01:20
Signed-off-by: 0x009922 <43530070+0x009922@users.noreply.github.com>
@0x009922 0x009922 force-pushed the commit-time-metrics branch from 9e62d5f to cc04cf2 Compare May 2, 2025 04:39
Signed-off-by: 0x009922 <43530070+0x009922@users.noreply.github.com>
@0x009922 0x009922 merged commit 5b8d388 into hyperledger-iroha:main May 7, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-changes Changes in the API for client libraries
Projects
None yet
4 participants