+
+ +
+

GPU Programming Model

+

CPUs and GPUs have separate memory, which means that working on both +the host and device may involve managing the transfer of data between +the memory on the host and that on the GPU.

+

In Castro, the core design when running on GPUs is that all of the compute +should be done on the GPU.

+

When we compile with USE_CUDA=TRUE or USE_HIP=TRUE, AMReX will allocate +a pool of memory on the GPUs and all of the StateData will be stored there. +As long as we then do all of the computation on the GPUs, then we don’t need +to manage any of the data movement manually.

+
+

Note

+

We can tell AMReX to allocate the data using managed-memory by +setting:

+
amrex.the_arena_is_managed = 1
+
+
+

This is generally not needed.

+
+

The programming model used throughout Castro is C++-lambda-capturing +by value. We access the FArrayBox stored in the StateData +MultiFab by creating an Array4 object. The Array4 does +not directly store a copy of the data, but instead has a pointer to +the data in the FArrayBox. When we capture the Array4 by +value in the GPU kernel, the GPU gets access to the pointer to the +underlying data.

+

Most AMReX functions will work on the data directly on the GPU (like +.setVal()).

+

In rare instances where we might need to operate on the data on the +host, we can force a copy to the host, do the work, and then copy +back. For an example, see the reduction done in Gravity.cpp.

+
+

Note

+

For a thorough discussion of how the AMReX GPU offloading works +see [57].

+
+
+

Runtime parameters

+

The main exception for all data being on the GPUs all the time are the +runtime parameters. At the moment, these are allocated as managed +memory and stored in global memory. This is simply to make it easier +to read them in and initialize them on the CPU at runtime.

+
+
+ + +
+