Put input parameters for MPI in a separate category

Hi-PACE · Mar 13, 2024 · f7c9b1a · f7c9b1a
1 parent 89e7976
commit f7c9b1a
Show file tree

Hide file tree

Showing 10 changed files with 37 additions and 42 deletions.
diff --git a/docs/source/building/platforms/booster_jsc.rst b/docs/source/building/platforms/booster_jsc.rst
@@ -65,7 +65,7 @@ and use it to submit a simulation.
 
 .. tip::
    Parallel simulations can be largely accelerated by using GPU-aware MPI.
-   To utilize GPU-aware MPI, the input parameter ``hipace.comms_buffer_on_gpu = 1`` must be set.
+   To utilize GPU-aware MPI, the input parameter ``comms_buffer.on_gpu = 1`` must be set.
 
    Note that using GPU-aware MPI may require more GPU memory.
 

diff --git a/docs/source/building/platforms/lumi_csc.rst b/docs/source/building/platforms/lumi_csc.rst
@@ -101,7 +101,7 @@ and use it to submit a simulation.
 
 .. tip::
    Parallel simulations can be largely accelerated by using GPU-aware MPI.
-   To utilize GPU-aware MPI, the input parameter ``hipace.comms_buffer_on_gpu = 1`` must be set and the following flag must be passed in the job script:
+   To utilize GPU-aware MPI, the input parameter ``comms_buffer.on_gpu = 1`` must be set and the following flag must be passed in the job script:
 
    .. code-block:: bash
 

diff --git a/docs/source/building/platforms/maxwell_desy.rst b/docs/source/building/platforms/maxwell_desy.rst
@@ -70,7 +70,7 @@ for more details and the required constraints). Please set the value accordingly
 
 .. tip::
    Parallel simulations can be largely accelerated by using GPU-aware MPI.
-   To utilize GPU-aware MPI, the input parameter ``hipace.comms_buffer_on_gpu = 1`` must be set and the following flag must be passed in the job script:
+   To utilize GPU-aware MPI, the input parameter ``comms_buffer.on_gpu = 1`` must be set and the following flag must be passed in the job script:
 
    .. code-block:: bash
 

diff --git a/docs/source/building/platforms/perlmutter_nersc.rst b/docs/source/building/platforms/perlmutter_nersc.rst
@@ -79,7 +79,7 @@ You can then create your directory in your ``$PSCRATCH``, where you can put your
     export MPICH_OFI_NIC_POLICY=GPU
 
     # for GPU-aware MPI use the first line
-    #HIPACE_GPU_AWARE_MPI="hipace.comms_buffer_on_gpu=1"
+    #HIPACE_GPU_AWARE_MPI="comms_buffer.on_gpu=1"
     HIPACE_GPU_AWARE_MPI=""
 
     # CUDA visible devices are ordered inverse to local task IDs
@@ -94,6 +94,6 @@ and use it to submit a simulation. Note, that this example simulation runs on 8
 
 .. tip::
    Parallel simulations can be largely accelerated by using GPU-aware MPI.
-   To utilize GPU-aware MPI, the input parameter ``hipace.comms_buffer_on_gpu = 1`` must be set (see the job script above).
+   To utilize GPU-aware MPI, the input parameter ``comms_buffer.on_gpu = 1`` must be set (see the job script above).
 
    Note that using GPU-aware MPI may require more GPU memory.
diff --git a/docs/source/run/parameters.rst b/docs/source/run/parameters.rst
@@ -80,24 +80,24 @@ General parameters
     By default, we use the ``nosmt`` option, which overwrites the OpenMP default of spawning one thread per logical CPU core, and instead only spawns a number of threads equal to the number of physical CPU cores on the machine.
     If set, the environment variable ``OMP_NUM_THREADS`` takes precedence over ``system`` and ``nosmt``, but not over integer numbers set in this option.
 
-* ``hipace.comms_buffer_on_gpu`` (`bool`) optional (default `0`)
+* ``comms_buffer.on_gpu`` (`bool`) optional (default `0`)
     Whether the buffers that hold the beam and the 3D laser envelope should be allocated on the GPU (device memory).
     By default they will be allocated on the CPU (pinned memory).
     Setting this option to `1` is necessary to take advantage of GPU-Enabled MPI, however for this
     additional enviroment variables need to be set depending on the system.
 
-* ``hipace.comms_buffer_max_leading_slices`` (`int`) optional (default `inf`)
+* ``comms_buffer.max_leading_slices`` (`int`) optional (default `inf`)
     How many slices of beam particles can be received and stored in advance.
 
-* ``hipace.comms_buffer_max_trailing_slices`` (`int`) optional (default `inf`)
+* ``comms_buffer.max_trailing_slices`` (`int`) optional (default `inf`)
     How many slices of beam particles can be stored before being sent. Using
-    ``comms_buffer_max_leading_slices`` and ``comms_buffer_max_trailing_slices`` will in principle
+    ``comms_buffer.max_leading_slices`` and ``comms_buffer.max_trailing_slices`` will in principle
     limit the amount of asynchronousness in the parallel communication and may thus reduce performance.
     However it may be necessary to set these parameters to avoid all slices accumulating on a single
-    rank that would run out of memory (out of CPU or GPU memory depending on ``hipace.comms_buffer_on_gpu``).
+    rank that would run out of memory (out of CPU or GPU memory depending on ``comms_buffer.on_gpu``).
     If there are more time steps than ranks, these parameters must be chosen such that between all
     ranks there is enough capacity to store every slice to avoid a deadlock, i.e.
-    ``comms_buffer_max_trailing_slices * nranks > nslices``.
+    ``comms_buffer.max_trailing_slices * nranks > nslices``.
 
 * ``hipace.do_tiling`` (`bool`) optional (default `true`)
     Whether to use tiling, when running on CPU.

diff --git a/src/Hipace.H b/src/Hipace.H
@@ -258,12 +258,6 @@ public:
     amrex::Parser m_salame_parser;
     /** Function to get the target Ez field for SALAME */
     amrex::ParserExecutor<3> m_salame_target_func;
-    /** Whether MPI communication buffers should be allocated in device memory */
-    bool m_comms_buffer_on_gpu = false;
-    /** How many slices of beam particles can be received in advance */
-    int m_comms_buffer_max_leading_slices = std::numeric_limits<int>::max();
-    /** How many slices of beam particles can be stored before being sent */
-    int m_comms_buffer_max_trailing_slices = std::numeric_limits<int>::max();
 
 private:
 

diff --git a/src/Hipace.cpp b/src/Hipace.cpp
@@ -140,19 +140,14 @@ Hipace::Hipace () :
 #endif
 
     queryWithParser(pph, "background_density_SI", m_background_density_SI);
-    queryWithParser(pph, "comms_buffer_on_gpu", m_comms_buffer_on_gpu);
-    queryWithParser(pph, "comms_buffer_max_leading_slices", m_comms_buffer_max_leading_slices);
-    queryWithParser(pph, "comms_buffer_max_trailing_slices", m_comms_buffer_max_trailing_slices);
+    DeprecatedInput("hipace", "comms_buffer_on_gpu", "comms_buffer.on_gpu", "", true);
+    DeprecatedInput("hipace", "comms_buffer_max_leading_slices",
+        "comms_buffer.max_leading_slices", "", true);
+    DeprecatedInput("hipace", "comms_buffer_max_trailing_slices",
+        "comms_buffer.max_trailing_slices)", "", true);
 
     MakeGeometry();
 
-    AMREX_ALWAYS_ASSERT_WITH_MESSAGE(
-        ((double(m_comms_buffer_max_trailing_slices)
-        * amrex::ParallelDescriptor::NProcs()) > m_3D_geom[0].Domain().length(2))
-        || (m_max_step < amrex::ParallelDescriptor::NProcs()),
-        "comms_buffer_max_trailing_slices must be large enough"
-        " to distribute all slices between all ranks if there are more timesteps than ranks");
-
     m_use_laser = m_multi_laser.m_use_laser;
 
     queryWithParser(pph, "collisions", m_collision_names);
@@ -211,11 +206,8 @@ Hipace::InitData ()
 
     m_multi_buffer.initialize(m_3D_geom[0].Domain().length(2),
                               m_multi_beam.get_nbeams(),
-                              !m_comms_buffer_on_gpu,
                               m_use_laser,
-                              m_use_laser ? m_multi_laser.getSlices()[0].box() : amrex::Box{},
-                              m_comms_buffer_max_leading_slices,
-                              m_comms_buffer_max_trailing_slices);
+                              m_use_laser ? m_multi_laser.getSlices()[0].box() : amrex::Box{});
 
     amrex::ParmParse pph("hipace");
     bool do_output_input = false;

diff --git a/src/laser/MultiLaser.cpp b/src/laser/MultiLaser.cpp
@@ -57,7 +57,7 @@ MultiLaser::ReadParameters ()
     if (!m_laser_from_file) {
         getWithParser(pp, "lambda0", m_lambda0);
     }
-    DeprecatedInput("lasers", "3d_on_host", "hipace.comms_buffer_on_gpu", "", true);
+    DeprecatedInput("lasers", "3d_on_host", "comms_buffer.on_gpu", "", true);
     queryWithParser(pp, "use_phase", m_use_phase);
     queryWithParser(pp, "solver_type", m_solver_type);
     AMREX_ALWAYS_ASSERT(m_solver_type == "multigrid" || m_solver_type == "fft");

diff --git a/src/utils/MultiBuffer.H b/src/utils/MultiBuffer.H
@@ -18,8 +18,7 @@ class MultiBuffer
 public:
 
     // initialize MultiBuffer and open initial receive requests
-    void initialize (int nslices, int nbeams, bool buffer_on_host, bool use_laser,
-                     amrex::Box laser_box, int max_leading_slices, int max_trailing_slices);
+    void initialize (int nslices, int nbeams, bool use_laser, amrex::Box laser_box);
 
     // receive data from previous rank and unpack it into MultiBeam and MultiLaser
     void get_data (int slice, MultiBeam& beams, MultiLaser& laser, int beam_slice);
@@ -107,13 +106,16 @@ private:
     MPI_Comm m_comm = MPI_COMM_NULL;
 
     // general parameters
-    bool m_buffer_on_host = true;
+    /** Whether MPI communication buffers should be allocated in device memory */
+    bool m_buffer_on_gpu = false;
     int m_nslices = 0;
     int m_nbeams = 0;
     bool m_use_laser = false;
     int m_laser_ncomp = 4;
     amrex::Box m_laser_slice_box {};
+    /** How many slices of beam particles can be received in advance */
     int m_max_leading_slices = std::numeric_limits<int>::max();
+    /** How many slices of beam particles can be stored before being sent */
     int m_max_trailing_slices = std::numeric_limits<int>::max();
 
     // parameters to send physical time

diff --git a/src/utils/MultiBuffer.cpp b/src/utils/MultiBuffer.cpp
@@ -8,6 +8,7 @@
 #include "MultiBuffer.H"
 #include "Hipace.H"
 #include "HipaceProfilerWrapper.H"
+#include "Parser.H"
 
 
 std::size_t MultiBuffer::get_metadata_size () {
@@ -24,7 +25,7 @@ std::size_t* MultiBuffer::get_metadata_location (int slice) {
 
 void MultiBuffer::allocate_buffer (int slice) {
     AMREX_ALWAYS_ASSERT(m_datanodes[slice].m_location == memory_location::nowhere);
-    if (m_buffer_on_host) {
+    if (!m_buffer_on_gpu) {
         m_datanodes[slice].m_buffer = reinterpret_cast<char*>(amrex::The_Pinned_Arena()->alloc(
             m_datanodes[slice].m_buffer_size * sizeof(storage_type)
         ));
@@ -49,17 +50,16 @@ void MultiBuffer::free_buffer (int slice) {
     m_datanodes[slice].m_buffer_size = 0;
 }
 
-void MultiBuffer::initialize (int nslices, int nbeams, bool buffer_on_host, bool use_laser,
-                              amrex::Box laser_box, int max_leading_slices,
-                              int max_trailing_slices) {
+void MultiBuffer::initialize (int nslices, int nbeams, bool use_laser, amrex::Box laser_box) {
+
+    amrex::ParmParse pp("comms_buffer");
 
     m_comm = amrex::ParallelDescriptor::Communicator();
     const int rank_id = amrex::ParallelDescriptor::MyProc();
     const int n_ranks = amrex::ParallelDescriptor::NProcs();
 
     m_nslices = nslices;
     m_nbeams = nbeams;
-    m_buffer_on_host = buffer_on_host;
     m_use_laser = use_laser;
     m_laser_slice_box = laser_box;
 
@@ -73,8 +73,15 @@ void MultiBuffer::initialize (int nslices, int nbeams, bool buffer_on_host, bool
     m_tag_buffer_start = 1;
     m_tag_metadata_start = m_tag_buffer_start + m_nslices;
 
-    m_max_leading_slices = max_leading_slices;
-    m_max_trailing_slices = max_trailing_slices;
+    queryWithParser(pp, "on_gpu", m_buffer_on_gpu);
+    queryWithParser(pp, "max_leading_slices", m_max_leading_slices);
+    queryWithParser(pp, "max_trailing_slices", m_max_trailing_slices);
+
+    AMREX_ALWAYS_ASSERT_WITH_MESSAGE(
+        ((double(m_max_trailing_slices) * n_ranks) > nslices)
+        || (Hipace::m_max_step < amrex::ParallelDescriptor::NProcs()),
+        "comms_buffer.max_trailing_slices must be large enough"
+        " to distribute all slices between all ranks if there are more timesteps than ranks");
 
     for (int p = 0; p < comm_progress::nprogress; ++p) {
         m_async_metadata_slice[p] = m_nslices - 1;