[Feature] Allowing the _id field in the segment index to be automatically generated by Elasticsearch (ES) will significantly improve the performance of bulk insertion, potentially by several times. #13089
Replies: 1 comment 9 replies
-
First of all, each id is inserted or updated in over 20s by default, unless you changed that privately. So, I can't see the race condition. This discussion had been done long time ago. And the conclusion is very clear. If you have concerns in elastic performance and resource cost, that is common, which is why we are all in BanyanDB, which is built by skywalking community itself, and focus on skywalking use cases. Elasticsearch is not our first priority anymore, because clearly there are several conflicts can't be resolved perfectly. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Search before asking
Description
In SkyWalking version 9.x, I noticed that the CPU usage of Elasticsearch (ES) in my cluster was very high. After checking the hot_threads, I found that the majority of the CPU consumption was attributed to PerThreadIDVersionAndSeqNoLookup.lookupVersion. This issue typically arises when ES ensures the uniqueness of _id. Upon inspecting the _id in the segment index, I discovered that the _id was indeed specified by SkyWalking. To address this, I created an ingest pipeline to remove the _id generated by SkyWalking and allowed ES to generate the IDs automatically. After implementing this change, the CPU usage dropped to about 10% of its previous level, and the hot_threads no longer showed this overhead. Additionally, the slow tasks related to segment batching disappeared from the ES task list.
Use case
Improve performance
Related issues
No response
Are you willing to submit a pull request to implement this on your own?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions