Skip to content

Latest commit

 

History

History
140 lines (107 loc) · 8.44 KB

collectd-kafka.md

File metadata and controls

140 lines (107 loc) · 8.44 KB

collectd/kafka

Monitors a Kafka instance using collectd's GenericJMX plugin.

This monitor has a set of built in MBeans configured for which it pulls metrics from Kafka's JMX endpoint.

Note that this monitor supports Kafka v0.8.2.x and above. For Kafka v1.x.x and above, apart from the list of default metrics, kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs is a good metric to monitor since it gives an understanding of how long brokers wait for requests to Zookeeper to be completed. Since Zookeeper is an integral part of a Kafka cluster, monitoring it using the [Zookeeper monitor] (https://docs.signalfx.com/en/latest/integrations/agent/monitors/collectd-zookeeper.html) is recommended. It is also a good idea to monitor disk utilization and network metrics of the underlying host.

See https://github.com/signalfx/integrations/tree/master/collectd-kafka.

Monitor Type: collectd/kafka

Monitor Source Code

Accepts Endpoints: Yes

Multiple Instances Allowed: Yes

Configuration

Config option Required Type Description
host yes string Host to connect to -- JMX must be configured for remote access and accessible from the agent
port yes integer JMX connection port (NOT the RMI port) on the application. This correponds to the com.sun.management.jmxremote.port Java property that should be set on the JVM when running the application.
name no string
serviceName no string This is how the service type is identified in the SignalFx UI so that you can get built-in content for it. For custom JMX integrations, it can be set to whatever you like and metrics will get the special property sf_hostHasService set to this value.
serviceURL no string The JMX connection string. This is rendered as a Go template and has access to the other values in this config. NOTE: under normal circumstances it is not advised to set this string directly - setting the host and port as specified above is preferred. (default: service:jmx:rmi:///jndi/rmi://{{.Host}}:{{.Port}}/jmxrmi)
instancePrefix no string
username no string
password no string
customDimensions no map of strings Takes in key-values pairs of custom dimensions at the connection level.
mBeansToCollect no list of strings A list of the MBeans defined in mBeanDefinitions to actually collect. If not provided, then all defined MBeans will be collected.
mBeansToOmit no list of strings A list of the MBeans to omit. This will come handy in cases where only a few MBeans need to omitted from the default list
mBeanDefinitions no map of objects (see below) Specifies how to map JMX MBean values to metrics. If using a specific service monitor such as cassandra, kafka, or activemq, they come pre-loaded with a set of mappings, and any that you add in this option will be merged with those. See collectd GenericJMX for more details.
clusterName yes string Cluster name to which the broker belongs

The nested mBeanDefinitions config object has the following fields:

Config option Required Type Description
objectName no string
instancePrefix no string
instanceFrom no list of strings
values no list of objects (see below)
dimensions no list of strings

The nested values config object has the following fields:

Config option Required Type Description
type no string
table no bool (default: false)
instancePrefix no string
instanceFrom no list of strings
attribute no string

Metrics

The following table lists the metrics available for this monitor. Metrics that are marked as Included are standard metrics and are monitored by default.

Name Type Included Description
counter.kafka-all-bytes-in cumulative Number of bytes received per second across all topics
counter.kafka-all-bytes-out cumulative Number of bytes transmitted per second across all topics
counter.kafka-log-flushes cumulative Number of log flushes per second
counter.kafka-messages-in cumulative Number of messages received per second across all topics
counter.kafka.fetch-consumer.total-time.count cumulative Number of fetch requests from consumers per second across all partitions
counter.kafka.fetch-follower.total-time.count cumulative Number of fetch requests from followers per second across all partitions
counter.kafka.produce.total-time.99th gauge 99th percentile of time in milliseconds to process produce requests
counter.kafka.produce.total-time.count cumulative Number of producer requests
counter.kafka.produce.total-time.median gauge Median time it takes to process a produce request
gauge.kafka-active-controllers gauge Specifies if the broker an active controller
gauge.kafka-log-flush-time-ms gauge Average number of milliseconds to flush a log
gauge.kafka-log-flush-time-ms-p95 gauge 95th percentile of log flush time in milliseconds
gauge.kafka-request-queue gauge Number of requests in the request queue across all partitions on the broker
gauge.kafka-underreplicated-partitions gauge Number of underreplicated partitions across all topics on the broker
gauge.kafka.fetch-consumer.total-time.99th gauge 99th percentile of time in milliseconds to process fetch requests from consumers
gauge.kafka.fetch-consumer.total-time.median gauge Median time it takes to process a fetch request from consumers
gauge.kafka.fetch-follower.total-time.99th gauge 99th percentile of time in milliseconds to process fetch requests from followers
gauge.kafka.fetch-follower.total-time.median gauge Median time it takes to process a fetch request from follower
kafka-isr-expands cumulative When a broker is brought up after a failure, it starts catching up by reading from the leader. Once it is caught up, it gets added back to the ISR.
kafka-isr-shrinks cumulative When a broker goes down, ISR for some of partitions will shrink. When that broker is up again, ISR will be expanded once the replicas are fully caught up. Other than that, the expected value for both ISR shrink rate and expansion rate is 0.
kafka-leader-election-rate cumulative Number of leader elections
kafka-max-lag gauge Maximum lag in messages between the follower and leader replicas
kafka-offline-partitions-count gauge Number of partitions that don’t have an active leader and are hence not writable or readable
kafka-unclean-elections cumulative Number of unclean leader elections. This happens when a leader goes down and an out-of-sync replica is chosen to be the leader

To specify custom metrics you want to monitor, add a metricsToInclude filter to the agent configuration, as shown in the code snippet below. The snippet lists all available custom metrics. You can copy and paste the snippet into your configuration file, then delete any custom metrics that you do not want sent.

Note that some of the custom metrics require you to set a flag as well as add them to the list. Check the monitor configuration file to see if a flag is required for gathering additional metrics.

metricsToInclude:
  - metricNames:
    - counter.kafka-all-bytes-in
    - counter.kafka-all-bytes-out
    - counter.kafka-log-flushes
    - counter.kafka.fetch-follower.total-time.count
    - counter.kafka.produce.total-time.99th
    - counter.kafka.produce.total-time.median
    - gauge.kafka-log-flush-time-ms
    - gauge.kafka-log-flush-time-ms-p95
    - kafka-isr-expands
    - kafka-isr-shrinks
    - kafka-leader-election-rate
    - kafka-max-lag
    - kafka-offline-partitions-count
    - kafka-unclean-elections
    monitorType: collectd/kafka