-
Notifications
You must be signed in to change notification settings - Fork 29
Issues
Issues with using ChaNGa and how to address them are listed here.
There is an issue with GCC 6.[12].X and Charm++, evidently an over-optimization that results in a crash immediatly after reading in the particles. To work around this, either use an earlier compiler version, or add -fno-lifetime-dse
to the charm build command. See https://charm.cs.illinois.edu/redmine/issues/1045 for more details.
For the "net" builds of charm++/ChaNGa, the common problem is starting ChaNGa on multiple nodes of your compute cluster. For MPI and other builds, this is taken care of by the cluster infrastructure, but for net builds, you are directly facing this problem.
"charmrun", which gets built when you "make" ChaNGa, is the program that handles this. If your cluster does have MPI installed, the easiest way to start things up is with
charmrun +p<procs> ++mpiexec ChaNGa cosmo.param
However, if your "mpiexec" is not the way you start an MPI program on your cluster, then you may need to write a wrapper. E.g. for the TACC clusters (stampede and lonestar) a wrapper would contain:
#!/bin/csh shift; shift; exec ibrun $*and you would call it with:
charmrun +p<procs> ++mpiexec ++remote-shell mympiexec ChaNGa cosmo.param
If MPI is not available, then charmrun will look at a nodelist file which has the format:
group main host node1 host node2In order for this to work, you need to be able to ssh into those nodes without a password. If your cluster is not set up to enable this by default, set up passwordless login using public keys. If you can interactive access to the compute nodes (e.g. with
qsub -I
) then a quick way to test this within the interactive session is to execute the command ssh node1 $PWD/ChaNGa
. If ChaNGa starts and gives a help message, then things are set up correctly. Otherwise the error message can help you diagnose the problem. Potential problems include: host keys not installed, user public keys not installed, and shared libraries not accessible.
There are many sanity checks within the code using the assert() call. Here are some common ones with explainations of what has gone wrong.
<code>Assertion "bInBox" failed in file TreePiece.cpp line 622</code>
This happens when running with periodic boundary conditions and a particle is WAY outside the fiducial box. This is an indication of bad initial conditions or "superluminal" velocities.
<code>Assertion "numIterations < 1000" failed in file Sorter.cpp line 806.</code>
Here domain decomposition has failed to divide the particles evenly among the domains to within a reasonable tolerance. This could be due to a pathological particle distribution, such as having all particles on top of each other. One solution is to loosen the tolerance by increasing the "ddTolerance" constant in ParallelGravity.h and recompile.
Memory use can be an issue in large simulations. One of the current big uses of memory in ChaNGa is the caching of off-processor data. This can be lowered by decreasing the depth of the cache "lines" with "-d" or "nCacheDepth". The default is 4, and size of a line scales as 2^d. Higher values mean more remote data is fetched at once, reducing latency costs at the price of higher memory use.
Deadlocks are hard to track down. One common deadlock is that a process gets held up in a lock within malloc() or free(). This will happen if you link with "-memory os" instead of using charm++ default memory allocator and the os malloc is not thread safe.
The CUDA implementation is still experimental.
Fatal CUDA Error all CUDA-capable devices are busy or unavailable at cuda-hybrid-api.cu:571.
This means 1) there are no GPUs on the host, or 2) more than one process is trying to access the GPU. For scenario 2, you might have more than one ChaNGa process on the host competing for the GPU. Either run in SMP mode with only one process per GPU host, or use the CUDA Multi Process Service (CUDA_MPS) to handle this situation. For Cray machines, setting the environment variable CRAY_CUDA_MPS=1
enables this. However, many compute clusters do not support this.