Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Termination due to Bus error (signal 7) #17

Open
maximilianigl opened this issue Nov 30, 2018 · 5 comments
Open

Termination due to Bus error (signal 7) #17

maximilianigl opened this issue Nov 30, 2018 · 5 comments

Comments

@maximilianigl
Copy link

Hi,

I'm running the code using mpirun inside a docker container. It worked at first but recently I started getting the error message

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 135
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7)

As far as I'm aware of I didn't change anything. Does anyone know where this might be coming from?
Thanks a lot!
Best, Max

@maximilianigl
Copy link
Author

maximilianigl commented Nov 30, 2018

It seems to not happen when I run with fewer parallel runs (30 instead of 50 or 40 on a server that supports up to 56 threads).

@Muguangfeng
Copy link

Hi! My computer has two gpus, When I run this code, the utilization of the first one is only 5%. The second one is even zero. Do you know why my GPU utilization is so low? Do I need to modify the code appropriately according to the configuration of each computer?
Thanks!

@Up-Huang
Copy link

Maybe this error stems from the nodes we used.

@falcaoceg
Copy link

Same thing here bro. For all I know it seems it has something to do with the docker default shared memory. I am not a 100% sure yet but right now I increased the container shared memory from 64Mb to many Gb to test.

@qiuyuleng1
Copy link

Same thing here bro. For all I know it seems it has something to do with the docker default shared memory. I am not a 100% sure yet but right now I increased the container shared memory from 64Mb to many Gb to test.

It works for me. When run the docker image, I add --shm-size=2000g

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants