-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I/O imbalance across bookies #24010
Comments
Apache Bookkeeper do not support load balance between bookies natively. |
You probably have some naughty partitions with heavy write throughput, you can check by following Prometheus query:
If you discover partitions with excessive write loads, consider increasing the number of partitions for the affected topics. This will help distribute the write throughput more evenly across your bookie instances. Notes about Pulsar message publishing:
|
|
Is there any other process that might be generating I/O writes in your deployment? You can check this using |
We are sure there is no other process. So we are also confused why bookie_WRITE_BYTES metrics are quite different from diskio.bytes_written bookie_WRITE_BYTES |
Does Can you provide a demo to reproduce this problem? |
Does bookie_WRITE_BYTES consistent with io write metric for journal disks? You may check compaction activities on ledger disks for further investigation. Can you provide a demo to reproduce this problem? |
We found that the reason for this problem is that in the default configuration |
Discussed in #24009
Originally posted by Jayer23 February 20, 2025
Description
We observed an abnormal write pattern in our Pulsar cluster:
The nodes exposed in bookie metrics have little difference in read and write io. Producer has compressed write enabled.
In fact, monitoring at the node level found that io was greatly magnified, and the io differences between nodes were large. Some nodes had io less than 50MB/s, while some nodes had io greater than 450MB/s.
Journal disks on some nodes show very high write throughput (e.g., 400+ MB/s).
Environment
3.0.7
3 replicas
,2 quorum writes
Expected Behavior
Write operations should be balanced between bookies
The text was updated successfully, but these errors were encountered: