collectd/kafka

Monitors a Kafka instance using collectd's GenericJMX plugin.

This monitor has a set of built in MBeans configured for which it pulls metrics from Kafka's JMX endpoint.

Note that this monitor supports Kafka v0.8.2.x and above. For Kafka v1.x.x and above, apart from the list of default metrics, kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs is a good metric to monitor since it gives an understanding of how long brokers wait for requests to Zookeeper to be completed. Since Zookeeper is an integral part of a Kafka cluster, monitoring it using the [Zookeeper monitor] (https://docs.signalfx.com/en/latest/integrations/agent/monitors/collectd-zookeeper.html) is recommended. It is also a good idea to monitor disk utilization and network metrics of the underlying host.

See https://github.com/signalfx/integrations/tree/master/collectd-kafka.

Monitor Type: collectd/kafka

Monitor Source Code

Accepts Endpoints: Yes

Multiple Instances Allowed: Yes

Configuration

Config option	Required	Type	Description
`host`	yes	`string`	Host to connect to -- JMX must be configured for remote access and accessible from the agent
`port`	yes	`integer`	JMX connection port (NOT the RMI port) on the application. This correponds to the `com.sun.management.jmxremote.port` Java property that should be set on the JVM when running the application.
`name`	no	`string`
`serviceName`	no	`string`	This is how the service type is identified in the SignalFx UI so that you can get built-in content for it. For custom JMX integrations, it can be set to whatever you like and metrics will get the special property `sf_hostHasService` set to this value.
`serviceURL`	no	`string`	The JMX connection string. This is rendered as a Go template and has access to the other values in this config. NOTE: under normal circumstances it is not advised to set this string directly - setting the host and port as specified above is preferred. (default: `service:jmx:rmi:///jndi/rmi://{{.Host}}:{{.Port}}/jmxrmi`)
`instancePrefix`	no	`string`
`username`	no	`string`
`password`	no	`string`
`customDimensions`	no	`map of strings`	Takes in key-values pairs of custom dimensions at the connection level.
`mBeansToCollect`	no	`list of strings`	A list of the MBeans defined in `mBeanDefinitions` to actually collect. If not provided, then all defined MBeans will be collected.
`mBeansToOmit`	no	`list of strings`	A list of the MBeans to omit. This will come handy in cases where only a few MBeans need to omitted from the default list
`mBeanDefinitions`	no	`map of objects (see below)`	Specifies how to map JMX MBean values to metrics. If using a specific service monitor such as cassandra, kafka, or activemq, they come pre-loaded with a set of mappings, and any that you add in this option will be merged with those. See collectd GenericJMX for more details.
`clusterName`	yes	`string`	Cluster name to which the broker belongs

The nested mBeanDefinitions config object has the following fields:

Config option	Required	Type
`objectName`	no	`string`
`instancePrefix`	no	`string`
`instanceFrom`	no	`list of strings`
`values`	no	`list of objects (see below)`
`dimensions`	no	`list of strings`

The nested values config object has the following fields:

Config option	Required	Type	Description
`type`	no	`string`
`table`	no	`bool`	(default: `false`)
`instancePrefix`	no	`string`
`instanceFrom`	no	`list of strings`
`attribute`	no	`string`

Metrics

The following table lists the metrics available for this monitor. Metrics that are marked as Included are standard metrics and are monitored by default.

Name	Type	Included	Description
`counter.kafka-all-bytes-in`	cumulative		Number of bytes received per second across all topics
`counter.kafka-all-bytes-out`	cumulative		Number of bytes transmitted per second across all topics
`counter.kafka-log-flushes`	cumulative		Number of log flushes per second
`counter.kafka-messages-in`	cumulative	✔	Number of messages received per second across all topics
`counter.kafka.fetch-consumer.total-time.count`	cumulative	✔	Number of fetch requests from consumers per second across all partitions
`counter.kafka.fetch-follower.total-time.count`	cumulative		Number of fetch requests from followers per second across all partitions
`counter.kafka.produce.total-time.99th`	gauge		99th percentile of time in milliseconds to process produce requests
`counter.kafka.produce.total-time.count`	cumulative	✔	Number of producer requests
`counter.kafka.produce.total-time.median`	gauge		Median time it takes to process a produce request
`gauge.kafka-active-controllers`	gauge	✔	Specifies if the broker an active controller
`gauge.kafka-log-flush-time-ms`	gauge		Average number of milliseconds to flush a log
`gauge.kafka-log-flush-time-ms-p95`	gauge		95th percentile of log flush time in milliseconds
`gauge.kafka-request-queue`	gauge	✔	Number of requests in the request queue across all partitions on the broker
`gauge.kafka-underreplicated-partitions`	gauge	✔	Number of underreplicated partitions across all topics on the broker
`gauge.kafka.fetch-consumer.total-time.99th`	gauge	✔	99th percentile of time in milliseconds to process fetch requests from consumers
`gauge.kafka.fetch-consumer.total-time.median`	gauge	✔	Median time it takes to process a fetch request from consumers
`gauge.kafka.fetch-follower.total-time.99th`	gauge	✔	99th percentile of time in milliseconds to process fetch requests from followers
`gauge.kafka.fetch-follower.total-time.median`	gauge	✔	Median time it takes to process a fetch request from follower
`kafka-isr-expands`	cumulative		When a broker is brought up after a failure, it starts catching up by reading from the leader. Once it is caught up, it gets added back to the ISR.
`kafka-isr-shrinks`	cumulative		When a broker goes down, ISR for some of partitions will shrink. When that broker is up again, ISR will be expanded once the replicas are fully caught up. Other than that, the expected value for both ISR shrink rate and expansion rate is 0.
`kafka-leader-election-rate`	cumulative		Number of leader elections
`kafka-max-lag`	gauge		Maximum lag in messages between the follower and leader replicas
`kafka-offline-partitions-count`	gauge		Number of partitions that don’t have an active leader and are hence not writable or readable
`kafka-unclean-elections`	cumulative		Number of unclean leader elections. This happens when a leader goes down and an out-of-sync replica is chosen to be the leader

To specify custom metrics you want to monitor, add a metricsToInclude filter to the agent configuration, as shown in the code snippet below. The snippet lists all available custom metrics. You can copy and paste the snippet into your configuration file, then delete any custom metrics that you do not want sent.

Note that some of the custom metrics require you to set a flag as well as add them to the list. Check the monitor configuration file to see if a flag is required for gathering additional metrics.

metricsToInclude:
  - metricNames:
    - counter.kafka-all-bytes-in
    - counter.kafka-all-bytes-out
    - counter.kafka-log-flushes
    - counter.kafka.fetch-follower.total-time.count
    - counter.kafka.produce.total-time.99th
    - counter.kafka.produce.total-time.median
    - gauge.kafka-log-flush-time-ms
    - gauge.kafka-log-flush-time-ms-p95
    - kafka-isr-expands
    - kafka-isr-shrinks
    - kafka-leader-election-rate
    - kafka-max-lag
    - kafka-offline-partitions-count
    - kafka-unclean-elections
    monitorType: collectd/kafka

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

collectd-kafka.md

collectd-kafka.md

collectd/kafka

Configuration

Metrics

Files

collectd-kafka.md

Latest commit

History

collectd-kafka.md

File metadata and controls

collectd/kafka

Configuration

Metrics