-
Notifications
You must be signed in to change notification settings - Fork 14
ENH: Reduce number of processes spawned by ProcessPoolExecutor
in blocked_io.py
#426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, thanks for looking into it. The max cpus is a good approximate for normal sized machines. If we are using a large machine with 48+ cores I can see where this could start trashing the machine. MDIO takes the following environment variable to control number of CPUs used in ingestion: |
We noticed this issue only after we saw some random ingestion failures. Out of two similar files, one fails and other other succeeds. The failed one would sometimes succeed if re-submitted multiple times. This causes loss of time in production. I believe setting the value to 80% of available logical cpus would take care of the issue. It would also leave some room to other processes (like opentelemetry) to run. |
I believe, setting Also, setting We can make mdio-python/src/mdio/segy/blocked_io.py Line 40 in 1475793
FYI: We routinely use 48+ core machines in production I will update here with some benchmarking results to validate the change. |
@amitpendharkar any updates on benchmarks? I am trying to close issues, and this one is still pending. Currently we are happy with defaults and the env. var. configuration. |
Issue:
ProcessPoolExecutor
spawns the processes based on max_workers argument. Total number of logical cpus are currently used as default value for max_workersmdio-python/src/mdio/segy/blocked_io.py
Line 124 in 1475793
This spawns too many processes. Based of the size of the segy file, resource contention and context switching bogs the machine down and eventually the run get's terminated.
Suggested solution:
Make default value of max_workers as either number of cores or limit it to 80% of available logical cpus.(discuss)
The text was updated successfully, but these errors were encountered: