Merge branch 'development' into Reverse_order_of_MPI_ranks

Hi-PACE · Mar 15, 2024 · afe711d · afe711d
2 parents 2be0a02 + 3bb72de
commit afe711d
Show file tree

Hide file tree

Showing 14 changed files with 133 additions and 46 deletions.
diff --git a/docs/source/building/platforms/booster_jsc.rst b/docs/source/building/platforms/booster_jsc.rst
@@ -65,7 +65,7 @@ and use it to submit a simulation.
 
 .. tip::
    Parallel simulations can be largely accelerated by using GPU-aware MPI.
-   To utilize GPU-aware MPI, the input parameter ``hipace.comms_buffer_on_gpu = 1`` must be set.
+   To utilize GPU-aware MPI, the input parameter ``comms_buffer.on_gpu = 1`` must be set.
 
    Note that using GPU-aware MPI may require more GPU memory.
 

diff --git a/docs/source/building/platforms/lumi_csc.rst b/docs/source/building/platforms/lumi_csc.rst
@@ -101,7 +101,7 @@ and use it to submit a simulation.
 
 .. tip::
    Parallel simulations can be largely accelerated by using GPU-aware MPI.
-   To utilize GPU-aware MPI, the input parameter ``hipace.comms_buffer_on_gpu = 1`` must be set and the following flag must be passed in the job script:
+   To utilize GPU-aware MPI, the input parameter ``comms_buffer.on_gpu = 1`` must be set and the following flag must be passed in the job script:
 
    .. code-block:: bash
 

diff --git a/docs/source/building/platforms/maxwell_desy.rst b/docs/source/building/platforms/maxwell_desy.rst
@@ -70,7 +70,7 @@ for more details and the required constraints). Please set the value accordingly
 
 .. tip::
    Parallel simulations can be largely accelerated by using GPU-aware MPI.
-   To utilize GPU-aware MPI, the input parameter ``hipace.comms_buffer_on_gpu = 1`` must be set and the following flag must be passed in the job script:
+   To utilize GPU-aware MPI, the input parameter ``comms_buffer.on_gpu = 1`` must be set and the following flag must be passed in the job script:
 
    .. code-block:: bash
 

diff --git a/docs/source/building/platforms/perlmutter_nersc.rst b/docs/source/building/platforms/perlmutter_nersc.rst
@@ -79,7 +79,7 @@ You can then create your directory in your ``$PSCRATCH``, where you can put your
     export MPICH_OFI_NIC_POLICY=GPU
 
     # for GPU-aware MPI use the first line
-    #HIPACE_GPU_AWARE_MPI="hipace.comms_buffer_on_gpu=1"
+    #HIPACE_GPU_AWARE_MPI="comms_buffer.on_gpu=1"
     HIPACE_GPU_AWARE_MPI=""
 
     # CUDA visible devices are ordered inverse to local task IDs
@@ -94,6 +94,6 @@ and use it to submit a simulation. Note, that this example simulation runs on 8
 
 .. tip::
    Parallel simulations can be largely accelerated by using GPU-aware MPI.
-   To utilize GPU-aware MPI, the input parameter ``hipace.comms_buffer_on_gpu = 1`` must be set (see the job script above).
+   To utilize GPU-aware MPI, the input parameter ``comms_buffer.on_gpu = 1`` must be set (see the job script above).
 
    Note that using GPU-aware MPI may require more GPU memory.
diff --git a/docs/source/run/parameters.rst b/docs/source/run/parameters.rst
@@ -80,24 +80,24 @@ General parameters
     By default, we use the ``nosmt`` option, which overwrites the OpenMP default of spawning one thread per logical CPU core, and instead only spawns a number of threads equal to the number of physical CPU cores on the machine.
     If set, the environment variable ``OMP_NUM_THREADS`` takes precedence over ``system`` and ``nosmt``, but not over integer numbers set in this option.
 
-* ``hipace.comms_buffer_on_gpu`` (`bool`) optional (default `0`)
+* ``comms_buffer.on_gpu`` (`bool`) optional (default `0`)
     Whether the buffers that hold the beam and the 3D laser envelope should be allocated on the GPU (device memory).
     By default they will be allocated on the CPU (pinned memory).
     Setting this option to `1` is necessary to take advantage of GPU-Enabled MPI, however for this
     additional enviroment variables need to be set depending on the system.
 
-* ``hipace.comms_buffer_max_leading_slices`` (`int`) optional (default `inf`)
+* ``comms_buffer.max_leading_slices`` (`int`) optional (default `inf`)
     How many slices of beam particles can be received and stored in advance.
 
-* ``hipace.comms_buffer_max_trailing_slices`` (`int`) optional (default `inf`)
+* ``comms_buffer.max_trailing_slices`` (`int`) optional (default `inf`)
     How many slices of beam particles can be stored before being sent. Using
-    ``comms_buffer_max_leading_slices`` and ``comms_buffer_max_trailing_slices`` will in principle
+    ``comms_buffer.max_leading_slices`` and ``comms_buffer.max_trailing_slices`` will in principle
     limit the amount of asynchronousness in the parallel communication and may thus reduce performance.
     However it may be necessary to set these parameters to avoid all slices accumulating on a single
-    rank that would run out of memory (out of CPU or GPU memory depending on ``hipace.comms_buffer_on_gpu``).
+    rank that would run out of memory (out of CPU or GPU memory depending on ``comms_buffer.on_gpu``).
     If there are more time steps than ranks, these parameters must be chosen such that between all
     ranks there is enough capacity to store every slice to avoid a deadlock, i.e.
-    ``comms_buffer_max_trailing_slices * nranks > nslices``.
+    ``comms_buffer.max_trailing_slices * nranks > nslices``.
 
 * ``hipace.do_tiling`` (`bool`) optional (default `true`)
     Whether to use tiling, when running on CPU.
@@ -1044,3 +1044,22 @@ Whether the energy loss due to classical radiation reaction of beam particles is
     Whether the beam particles undergo energy loss due to classical radiation reaction.
     The implemented radiation reaction model is based on this publication: `M. Tamburini et al., NJP 12, 123005 <https://doi.org/10.1088/1367-2630/12/12/123005>`__
     In normalized units, `hipace.background_density_SI` must be specified.
+
+Spin tracking
+-------------
+
+Track the spin of each beam particle as it is rotated by the electromagnetic fields using the
+Thomas-Bargmann-Michel-Telegdi (TBMT) model, see
+[Z. Gong et al., Matter and Radiation at Extremes 8.6 (2023), https://doi.org/10.1063/5.0152382]
+for the details of the implementation.
+This will add three extra components to each beam particle to store the spin and output
+those as part of the beam diagnostic.
+
+* ``<beam name> or beams.do_spin_tracking`` (`bool`) optional (default `0`)
+    Enable spin tracking
+
+* ``<beam name> or beams.initial_spin`` (3 `float`)
+    Initial spin ``sx sy sz`` of all particles. The length of the three components is normalized to one.
+
+* ``<beam name> or beams.spin_anom`` (`bool`) optional (default `0.00115965218128`)
+    The anomalous magnetic moment. The default value is the moment for electrons.
diff --git a/src/Hipace.H b/src/Hipace.H
@@ -266,12 +266,6 @@ public:
     amrex::Parser m_salame_parser;
     /** Function to get the target Ez field for SALAME */
     amrex::ParserExecutor<3> m_salame_target_func;
-    /** Whether MPI communication buffers should be allocated in device memory */
-    bool m_comms_buffer_on_gpu = false;
-    /** How many slices of beam particles can be received in advance */
-    int m_comms_buffer_max_leading_slices = std::numeric_limits<int>::max();
-    /** How many slices of beam particles can be stored before being sent */
-    int m_comms_buffer_max_trailing_slices = std::numeric_limits<int>::max();
 
 private:
 

diff --git a/src/Hipace.cpp b/src/Hipace.cpp
@@ -140,19 +140,14 @@ Hipace::Hipace () :
 #endif
 
     queryWithParser(pph, "background_density_SI", m_background_density_SI);
-    queryWithParser(pph, "comms_buffer_on_gpu", m_comms_buffer_on_gpu);
-    queryWithParser(pph, "comms_buffer_max_leading_slices", m_comms_buffer_max_leading_slices);
-    queryWithParser(pph, "comms_buffer_max_trailing_slices", m_comms_buffer_max_trailing_slices);
+    DeprecatedInput("hipace", "comms_buffer_on_gpu", "comms_buffer.on_gpu", "", true);
+    DeprecatedInput("hipace", "comms_buffer_max_leading_slices",
+        "comms_buffer.max_leading_slices", "", true);
+    DeprecatedInput("hipace", "comms_buffer_max_trailing_slices",
+        "comms_buffer.max_trailing_slices)", "", true);
 
     MakeGeometry();
 
-    AMREX_ALWAYS_ASSERT_WITH_MESSAGE(
-        ((double(m_comms_buffer_max_trailing_slices)
-        * amrex::ParallelDescriptor::NProcs()) > m_3D_geom[0].Domain().length(2))
-        || (m_max_step < amrex::ParallelDescriptor::NProcs()),
-        "comms_buffer_max_trailing_slices must be large enough"
-        " to distribute all slices between all ranks if there are more timesteps than ranks");
-
     m_use_laser = m_multi_laser.m_use_laser;
 
     queryWithParser(pph, "collisions", m_collision_names);
@@ -211,11 +206,8 @@ Hipace::InitData ()
 
     m_multi_buffer.initialize(m_3D_geom[0].Domain().length(2),
                               m_multi_beam.get_nbeams(),
-                              !m_comms_buffer_on_gpu,
                               m_use_laser,
-                              m_use_laser ? m_multi_laser.getSlices()[0].box() : amrex::Box{},
-                              m_comms_buffer_max_leading_slices,
-                              m_comms_buffer_max_trailing_slices);
+                              m_use_laser ? m_multi_laser.getSlices()[0].box() : amrex::Box{});
 
     amrex::ParmParse pph("hipace");
     bool do_output_input = false;

diff --git a/src/diagnostics/OpenPMDWriter.cpp b/src/diagnostics/OpenPMDWriter.cpp
@@ -298,7 +298,8 @@ OpenPMDWriter::CopyBeams (MultiBeam& beams, const amrex::Vector< std::string > b
                     m_uint64_beam_data[ibeam][idx].get() + m_offset[ibeam]);
             }
 
-            AMREX_ALWAYS_ASSERT_WITH_MESSAGE(m_real_beam_data[ibeam].size() == soa.NumRealComps(),
+            AMREX_ALWAYS_ASSERT_WITH_MESSAGE(
+                int(m_real_beam_data[ibeam].size()) == soa.NumRealComps(),
                 "List of real names in openPMD Writer class does not match the beam");
 
             for (std::size_t idx=0; idx<m_real_beam_data[ibeam].size(); idx++) {

diff --git a/src/laser/MultiLaser.cpp b/src/laser/MultiLaser.cpp
@@ -57,7 +57,7 @@ MultiLaser::ReadParameters ()
     if (!m_laser_from_file) {
         getWithParser(pp, "lambda0", m_lambda0);
     }
-    DeprecatedInput("lasers", "3d_on_host", "hipace.comms_buffer_on_gpu", "", true);
+    DeprecatedInput("lasers", "3d_on_host", "comms_buffer.on_gpu", "", true);
     queryWithParser(pp, "use_phase", m_use_phase);
     queryWithParser(pp, "solver_type", m_solver_type);
     AMREX_ALWAYS_ASSERT(m_solver_type == "multigrid" || m_solver_type == "fft");

diff --git a/src/particles/beam/BeamParticleContainer.H b/src/particles/beam/BeamParticleContainer.H
@@ -191,12 +191,20 @@ public:
         return m_total_num_particles;
     }
 
-    int numRealComponents () const { return BeamIdx::real_nattribs; }
+    int numRealComponents () const {
+        return BeamIdx::real_nattribs + (m_do_spin_tracking ? 3 : 0);
+    }
     int numIntComponents () const { return BeamIdx::int_nattribs; }
 
     bool communicateIdCpuComponent () const { return true; }
-    bool communicateRealComponent (int rcomp) const { return rcomp < BeamIdx::real_nattribs_in_buffer; }
-    bool communicateIntComponent (int icomp) const { return icomp < BeamIdx::int_nattribs_in_buffer; }
+    bool communicateRealComponent (int rcomp) const {
+        // communicate all compile-time and runtime real components
+        return rcomp < numRealComponents();
+    }
+    bool communicateIntComponent (int icomp) const {
+        // don't communicate nsubcycles
+        return icomp < BeamIdx::int_nattribs_in_buffer;
+    }
 
 private:
     int m_slice_permutation = 0;
@@ -228,6 +236,10 @@ public:
     amrex::Array<amrex::Parser, 6> m_external_fields_parser;
     /** If spin tracking is enabled for this beam */
     bool m_do_spin_tracking = false;
+    /** Initial spin of all particles */
+    amrex::RealVect m_initial_spin = {1, 0, 0,};
+    /** The anomalous magnetic moment */
+    amrex::Real m_spin_anom = 0.00115965218128;
 private:
     std::string m_name; /**< name of the species */
     /** injection type, fixed_width or fixed_ppc */

diff --git a/src/particles/beam/BeamParticleContainer.cpp b/src/particles/beam/BeamParticleContainer.cpp
@@ -103,6 +103,15 @@ BeamParticleContainer::ReadParameters ()
         soa.GetIntData()[icomp].setArena(
             m_initialize_on_cpu ? amrex::The_Pinned_Arena() : amrex::The_Arena());
     }
+    queryWithParserAlt(pp, "do_spin_tracking", m_do_spin_tracking, pp_alt);
+    if (m_do_spin_tracking) {
+        getWithParserAlt(pp, "initial_spin", m_initial_spin, pp_alt);
+        queryWithParserAlt(pp, "spin_anom", m_spin_anom, pp_alt);
+        for (auto& beam_tile : m_slices) {
+            // Use 3 real and 0 int runtime components
+            beam_tile.define(3, 0);
+        }
+    }
 }
 
 amrex::Real
@@ -383,6 +392,20 @@ BeamParticleContainer::intializeSlice (int slice, int which_slice) {
             }
         );
     }
+
+    if (m_do_spin_tracking) {
+        auto ptd = getBeamSlice(which_slice).getParticleTileData();
+
+        const amrex::RealVect initial_spin_norm = m_initial_spin / m_initial_spin.vectorLength();
+
+        amrex::ParallelFor(getNumParticles(which_slice),
+            [=] AMREX_GPU_DEVICE (const int ip) {
+                ptd.m_runtime_rdata[0][ip] = initial_spin_norm[0];
+                ptd.m_runtime_rdata[1][ip] = initial_spin_norm[1];
+                ptd.m_runtime_rdata[2][ip] = initial_spin_norm[2];
+            }
+        );
+    }
 }
 
 void

diff --git a/src/particles/pusher/BeamParticleAdvance.cpp b/src/particles/pusher/BeamParticleAdvance.cpp
@@ -32,6 +32,8 @@ AdvanceBeamParticlesSlice (
     const amrex::Real dt = Hipace::GetInstance().m_dt / n_subcycles;
     const amrex::Real background_density_SI = Hipace::m_background_density_SI;
     const bool normalized_units = Hipace::m_normalized_units;
+    const bool spin_tracking = beam.m_do_spin_tracking;
+    const amrex::Real spin_anom = beam.m_spin_anom;
 
     if (normalized_units && radiation_reaction) {
         AMREX_ALWAYS_ASSERT_WITH_MESSAGE(background_density_SI!=0,
@@ -140,6 +142,13 @@ AdvanceBeamParticlesSlice (
 
             int i = ptd.idata(BeamIdx::nsubcycles)[ip];
 
+            amrex::RealVect spin {0._rt, 0._rt, 0._rt};
+            if (spin_tracking) {
+                spin[0] = ptd.m_runtime_rdata[0][ip];
+                spin[1] = ptd.m_runtime_rdata[1][ip];
+                spin[2] = ptd.m_runtime_rdata[2][ip];
+            }
+
             for (; i < n_subcycles; i++) {
 
                 if (zp < min_z) {
@@ -214,6 +223,28 @@ AdvanceBeamParticlesSlice (
                        + uy_intermediate*uy_intermediate
                        + uz_intermediate*uz_intermediate )*inv_c2 );
 
+                if (spin_tracking) {
+                    const amrex::RealVect E {ExmByp + clight*Byp, EypBxp - clight*Bxp, Ezp};
+                    const amrex::RealVect B {Bxp, Byp, Bzp};
+                    const amrex::RealVect u {ux_intermediate*inv_clight, uy_intermediate*inv_clight,
+                                             uz_intermediate*inv_clight};
+                    const amrex::RealVect beta = u*gamma_intermediate_inv;
+                    const amrex::Real gamma_inv_p1 =
+                        gamma_intermediate_inv / (1._rt + gamma_intermediate_inv);
+
+                    const amrex::RealVect omega = std::abs(charge_mass_ratio) * (
+                        B * gamma_intermediate_inv - beta.crossProduct(E) * inv_clight * gamma_inv_p1
+                        + spin_anom * (
+                            B - gamma_inv_p1 * u * beta.dotProduct(B) - beta.crossProduct(E) * inv_clight
+                        )
+                    );
+
+                    const amrex::RealVect h = omega * dt * 0.5_rt;
+                    const amrex::RealVect s_prime = spin + h.crossProduct(spin);
+                    const amrex::Real o = 1._rt / (1._rt + h.dotProduct(h));
+                    spin = o * (s_prime + (h.dotProduct(s_prime) * h + h.crossProduct(s_prime)));
+                }
+
                 amrex::ParticleReal uz_next = uz + dt * charge_mass_ratio
                     * ( Ezp + ( ux_intermediate * Byp - uy_intermediate * Bxp )
                     * gamma_intermediate_inv );
@@ -300,5 +331,11 @@ AdvanceBeamParticlesSlice (
             ptd.rdata(BeamIdx::ux)[ip] = ux;
             ptd.rdata(BeamIdx::uy)[ip] = uy;
             ptd.rdata(BeamIdx::uz)[ip] = uz;
+
+            if (spin_tracking) {
+                ptd.m_runtime_rdata[0][ip] = spin[0];
+                ptd.m_runtime_rdata[1][ip] = spin[1];
+                ptd.m_runtime_rdata[2][ip] = spin[2];
+            }
         });
 }
diff --git a/src/utils/MultiBuffer.H b/src/utils/MultiBuffer.H
@@ -18,8 +18,7 @@ class MultiBuffer
 public:
 
     // initialize MultiBuffer and open initial receive requests
-    void initialize (int nslices, int nbeams, bool buffer_on_host, bool use_laser,
-                     amrex::Box laser_box, int max_leading_slices, int max_trailing_slices);
+    void initialize (int nslices, int nbeams, bool use_laser, amrex::Box laser_box);
 
     // receive data from previous rank and unpack it into MultiBeam and MultiLaser
     void get_data (int slice, MultiBeam& beams, MultiLaser& laser, int beam_slice);
@@ -107,13 +106,16 @@ private:
     MPI_Comm m_comm = MPI_COMM_NULL;
 
     // general parameters
-    bool m_buffer_on_host = true;
+    /** Whether MPI communication buffers should be allocated in device memory */
+    bool m_buffer_on_gpu = false;
     int m_nslices = 0;
     int m_nbeams = 0;
     bool m_use_laser = false;
     int m_laser_ncomp = 4;
     amrex::Box m_laser_slice_box {};
+    /** How many slices of beam particles can be received in advance */
     int m_max_leading_slices = std::numeric_limits<int>::max();
+    /** How many slices of beam particles can be stored before being sent */
     int m_max_trailing_slices = std::numeric_limits<int>::max();
 
     // parameters to send physical time

diff --git a/src/utils/MultiBuffer.cpp b/src/utils/MultiBuffer.cpp
@@ -8,6 +8,7 @@
 #include "MultiBuffer.H"
 #include "Hipace.H"
 #include "HipaceProfilerWrapper.H"
+#include "Parser.H"
 
 
 std::size_t MultiBuffer::get_metadata_size () {
@@ -24,7 +25,7 @@ std::size_t* MultiBuffer::get_metadata_location (int slice) {
 
 void MultiBuffer::allocate_buffer (int slice) {
     AMREX_ALWAYS_ASSERT(m_datanodes[slice].m_location == memory_location::nowhere);
-    if (m_buffer_on_host) {
+    if (!m_buffer_on_gpu) {
         m_datanodes[slice].m_buffer = reinterpret_cast<char*>(amrex::The_Pinned_Arena()->alloc(
             m_datanodes[slice].m_buffer_size * sizeof(storage_type)
         ));
@@ -49,17 +50,16 @@ void MultiBuffer::free_buffer (int slice) {
     m_datanodes[slice].m_buffer_size = 0;
 }
 
-void MultiBuffer::initialize (int nslices, int nbeams, bool buffer_on_host, bool use_laser,
-                              amrex::Box laser_box, int max_leading_slices,
-                              int max_trailing_slices) {
+void MultiBuffer::initialize (int nslices, int nbeams, bool use_laser, amrex::Box laser_box) {
+
+    amrex::ParmParse pp("comms_buffer");
 
     m_comm = amrex::ParallelDescriptor::Communicator();
     const int rank_id = amrex::ParallelDescriptor::MyProc();
     const int n_ranks = amrex::ParallelDescriptor::NProcs();
 
     m_nslices = nslices;
     m_nbeams = nbeams;
-    m_buffer_on_host = buffer_on_host;
     m_use_laser = use_laser;
     m_laser_slice_box = laser_box;
 
@@ -73,8 +73,15 @@ void MultiBuffer::initialize (int nslices, int nbeams, bool buffer_on_host, bool
     m_tag_buffer_start = 1;
     m_tag_metadata_start = m_tag_buffer_start + m_nslices;
 
-    m_max_leading_slices = max_leading_slices;
-    m_max_trailing_slices = max_trailing_slices;
+    queryWithParser(pp, "on_gpu", m_buffer_on_gpu);
+    queryWithParser(pp, "max_leading_slices", m_max_leading_slices);
+    queryWithParser(pp, "max_trailing_slices", m_max_trailing_slices);
+
+    AMREX_ALWAYS_ASSERT_WITH_MESSAGE(
+        ((double(m_max_trailing_slices) * n_ranks) > nslices)
+        || (Hipace::m_max_step < amrex::ParallelDescriptor::NProcs()),
+        "comms_buffer.max_trailing_slices must be large enough"
+        " to distribute all slices between all ranks if there are more timesteps than ranks");
 
     for (int p = 0; p < comm_progress::nprogress; ++p) {
         m_async_metadata_slice[p] = m_nslices - 1;