GPU Programming Model
+CPUs and GPUs have separate memory, which means that working on both +the host and device may involve managing the transfer of data between +the memory on the host and that on the GPU.
+In Castro, the core design when running on GPUs is that all of the compute +should be done on the GPU.
+When we compile with USE_CUDA=TRUE
or USE_HIP=TRUE
, AMReX will allocate
+a pool of memory on the GPUs and all of the StateData
will be stored there.
+As long as we then do all of the computation on the GPUs, then we don’t need
+to manage any of the data movement manually.
Note
+We can tell AMReX to allocate the data using managed-memory by +setting:
+amrex.the_arena_is_managed = 1
+
This is generally not needed.
+The programming model used throughout Castro is C++-lambda-capturing
+by value. We access the FArrayBox
stored in the StateData
+MultiFab
by creating an Array4
object. The Array4
does
+not directly store a copy of the data, but instead has a pointer to
+the data in the FArrayBox
. When we capture the Array4
by
+value in the GPU kernel, the GPU gets access to the pointer to the
+underlying data.
Most AMReX functions will work on the data directly on the GPU (like
+.setVal()
).
In rare instances where we might need to operate on the data on the
+host, we can force a copy to the host, do the work, and then copy
+back. For an example, see the reduction done in Gravity.cpp
.
Note
+For a thorough discussion of how the AMReX GPU offloading works +see [57].
+Runtime parameters
+The main exception for all data being on the GPUs all the time are the +runtime parameters. At the moment, these are allocated as managed +memory and stored in global memory. This is simply to make it easier +to read them in and initialize them on the CPU at runtime.
+