Skip to content

COUNTER-type metric names do not have the _total suffix #1154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JasirVoriya opened this issue Mar 12, 2025 · 3 comments
Open

COUNTER-type metric names do not have the _total suffix #1154

JasirVoriya opened this issue Mar 12, 2025 · 3 comments
Assignees

Comments

@JasirVoriya
Copy link

Rules configuration:

rules:
  - pattern: kafka.network<type=RequestMetrics, name=(.+), (.+)=(.+), (.+)=(.+)><>Count
    name: kafka_network_RequestMetrics_$1
    type: COUNTER
    labels:
      "$2": "$3"
      "$4": "$5"
  - pattern: kafka.network<type=RequestMetrics, name=(.+), (.+)=(.+), (.+)=(.+)><>(.+)Rate
    name: kafka_network_RequestMetrics_$1
    type: GAUGE
    labels:
      "$2": "$3"
      "$4": "$5"
      rate: "$6"

Using the above configuration, metrics with the same name but different types are collected, as shown below:

MatchedRule{name='kafka_network_RequestMetrics_RequestsPerSec', matchName='kafka.network<type=RequestMetrics, name=RequestsPerSec, request=OffsetCommit, version=7><>Count: 13589', type='COUNTER' ... }
MatchedRule{name='kafka_network_RequestMetrics_RequestsPerSec', matchName='kafka.network<type=RequestMetrics, name=RequestsPerSec, request=OffsetCommit, version=7><>OneMinuteRate: 0.1664303423094177', type='GAUGE' ... }

Ideally, for metrics with a type of COUNTER, the metric name should include the _total suffix, resulting in kafka_network_RequestMetrics_RequestsPerSec_total. However, in practice, the suffix is missing.

After debugging the code, I identified the issue in the following section:

// io.prometheus.jmx.MatchedRuleToMetricSnapshotsConverter

    public static MetricSnapshots convert(List<MatchedRule> matchedRules) {
        Map<String, List<MatchedRule>> rulesByPrometheusMetricName = new HashMap<>();

        for (MatchedRule matchedRule : matchedRules) {
            List<MatchedRule> matchedRulesWithSameName =
                    rulesByPrometheusMetricName.computeIfAbsent(
                            matchedRule.name, name -> new ArrayList<>());
            matchedRulesWithSameName.add(matchedRule);
        }

        /* Code omitted */

        MetricSnapshots.Builder result = MetricSnapshots.builder();
        for (List<MatchedRule> rulesWithSameName : rulesByPrometheusMetricName.values()) {
            result.metricSnapshot(convertRulesWithSameName(rulesWithSameName));
        }
        return result.build();
    }
    private static MetricSnapshot convertRulesWithSameName(List<MatchedRule> rulesWithSameName) {
        boolean labelsUnique = isLabelsUnique(rulesWithSameName);
        switch (getType(rulesWithSameName)) {
            case "COUNTER":
                /* Code omitted */
            case "GAUGE":
                /* Code omitted */
            default:
                /* Code omitted */
        }
    }

    /** If all rules have the same type, that type is returned. Otherwise, "UNKNOWN" is returned. */
    private static String getType(List<MatchedRule> rulesWithSameName) {
        if (rulesWithSameName.stream().map(rule -> rule.type).distinct().count() == 1) {
            return rulesWithSameName.get(0).type;
        }
        return "UNKNOWN";
    }

Here, metrics with the same name are grouped together, and when the getType method is called, it returns "UNKNOWN". This causes COUNTER type metrics to be incorrectly processed and results in the _total suffix not being added.

@JasirVoriya
Copy link
Author

The grouping logic in the convert method should not only consider the name of the metric but also include the type in the grouping key. This ensures that metrics with the same name but different type are grouped separately and handled correctly.

Current Code

The current grouping logic is as follows:

for (MatchedRule matchedRule : matchedRules) {
    List<MatchedRule> matchedRulesWithSameName =
            rulesByPrometheusMetricName.computeIfAbsent(
                    matchedRule.name, name -> new ArrayList<>());
    matchedRulesWithSameName.add(matchedRule);
}

Suggested Improvement

Update the grouping logic to include both name and type as the grouping key. For example:

for (MatchedRule matchedRule : matchedRules) {
    String key = matchedRule.name + ":" + matchedRule.type; // Combine name and type as the key
    List<MatchedRule> matchedRulesWithSameKey =
            rulesByPrometheusMetricName.computeIfAbsent(
                    key, k -> new ArrayList<>());
    matchedRulesWithSameKey.add(matchedRule);
}

This change ensures that metrics with the same name but different type are treated as distinct groups. For example:

  • kafka_network_RequestMetrics_RequestsPerSec:COUNTER
  • kafka_network_RequestMetrics_RequestsPerSec:GAUGE

Benefits

  1. Correct Handling of COUNTER Metrics: Metrics with the COUNTER type will be processed correctly, and the _total suffix will be added as expected.
  2. Avoid Type Conflicts: Metrics with the same name but different type will no longer conflict during processing.
  3. Improved Clarity: Separating metrics by type ensures that the Prometheus exporter produces clear and unambiguous metrics.

@JasirVoriya
Copy link
Author

However, this will trigger a duplicate metric name exception:

An Exception occurred while scraping metrics: java.lang.IllegalArgumentException: kafka_network_RequestMetrics_ErrorsPerSec: duplicate metric name
	at io.prometheus.metrics.model.snapshots.MetricSnapshots.<init>(MetricSnapshots.java:45)
	at io.prometheus.metrics.model.snapshots.MetricSnapshots$Builder.build(MetricSnapshots.java:99)
	at io.prometheus.jmx.MatchedRuleToMetricSnapshotsConverter.convert(MatchedRuleToMetricSnapshotsConverter.java:82)
	at io.prometheus.jmx.JmxCollector.collect(JmxCollector.java:909)
	at io.prometheus.metrics.model.registry.MultiCollector.collect(MultiCollector.java:21)
	at io.prometheus.metrics.model.registry.PrometheusRegistry.scrape(PrometheusRegistry.java:84)
	at io.prometheus.metrics.exporter.common.PrometheusScrapeHandler.scrape(PrometheusScrapeHandler.java:135)
	at io.prometheus.metrics.exporter.common.PrometheusScrapeHandler.handleRequest(PrometheusScrapeHandler.java:56)
	at io.prometheus.metrics.exporter.httpserver.MetricsHandler.handle(MetricsHandler.java:33)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:95)
	at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:98)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:852)
	at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:95)
	at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:819)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:842)

@JasirVoriya
Copy link
Author

So, in this situation, how should it be handled?

@dhoard dhoard self-assigned this Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants