Skip to content

Commit

Permalink
Merge branch 'development' into Reverse_order_of_MPI_ranks
Browse files Browse the repository at this point in the history
  • Loading branch information
AlexanderSinn committed Mar 15, 2024
2 parents 2be0a02 + 3bb72de commit afe711d
Show file tree
Hide file tree
Showing 14 changed files with 133 additions and 46 deletions.
2 changes: 1 addition & 1 deletion docs/source/building/platforms/booster_jsc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ and use it to submit a simulation.

.. tip::
Parallel simulations can be largely accelerated by using GPU-aware MPI.
To utilize GPU-aware MPI, the input parameter ``hipace.comms_buffer_on_gpu = 1`` must be set.
To utilize GPU-aware MPI, the input parameter ``comms_buffer.on_gpu = 1`` must be set.

Note that using GPU-aware MPI may require more GPU memory.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/building/platforms/lumi_csc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ and use it to submit a simulation.
.. tip::
Parallel simulations can be largely accelerated by using GPU-aware MPI.
To utilize GPU-aware MPI, the input parameter ``hipace.comms_buffer_on_gpu = 1`` must be set and the following flag must be passed in the job script:
To utilize GPU-aware MPI, the input parameter ``comms_buffer.on_gpu = 1`` must be set and the following flag must be passed in the job script:
.. code-block:: bash
Expand Down
2 changes: 1 addition & 1 deletion docs/source/building/platforms/maxwell_desy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ for more details and the required constraints). Please set the value accordingly
.. tip::
Parallel simulations can be largely accelerated by using GPU-aware MPI.
To utilize GPU-aware MPI, the input parameter ``hipace.comms_buffer_on_gpu = 1`` must be set and the following flag must be passed in the job script:
To utilize GPU-aware MPI, the input parameter ``comms_buffer.on_gpu = 1`` must be set and the following flag must be passed in the job script:

.. code-block:: bash
Expand Down
4 changes: 2 additions & 2 deletions docs/source/building/platforms/perlmutter_nersc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ You can then create your directory in your ``$PSCRATCH``, where you can put your
export MPICH_OFI_NIC_POLICY=GPU
# for GPU-aware MPI use the first line
#HIPACE_GPU_AWARE_MPI="hipace.comms_buffer_on_gpu=1"
#HIPACE_GPU_AWARE_MPI="comms_buffer.on_gpu=1"
HIPACE_GPU_AWARE_MPI=""
# CUDA visible devices are ordered inverse to local task IDs
Expand All @@ -94,6 +94,6 @@ and use it to submit a simulation. Note, that this example simulation runs on 8

.. tip::
Parallel simulations can be largely accelerated by using GPU-aware MPI.
To utilize GPU-aware MPI, the input parameter ``hipace.comms_buffer_on_gpu = 1`` must be set (see the job script above).
To utilize GPU-aware MPI, the input parameter ``comms_buffer.on_gpu = 1`` must be set (see the job script above).

Note that using GPU-aware MPI may require more GPU memory.
31 changes: 25 additions & 6 deletions docs/source/run/parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,24 +80,24 @@ General parameters
By default, we use the ``nosmt`` option, which overwrites the OpenMP default of spawning one thread per logical CPU core, and instead only spawns a number of threads equal to the number of physical CPU cores on the machine.
If set, the environment variable ``OMP_NUM_THREADS`` takes precedence over ``system`` and ``nosmt``, but not over integer numbers set in this option.

* ``hipace.comms_buffer_on_gpu`` (`bool`) optional (default `0`)
* ``comms_buffer.on_gpu`` (`bool`) optional (default `0`)
Whether the buffers that hold the beam and the 3D laser envelope should be allocated on the GPU (device memory).
By default they will be allocated on the CPU (pinned memory).
Setting this option to `1` is necessary to take advantage of GPU-Enabled MPI, however for this
additional enviroment variables need to be set depending on the system.

* ``hipace.comms_buffer_max_leading_slices`` (`int`) optional (default `inf`)
* ``comms_buffer.max_leading_slices`` (`int`) optional (default `inf`)
How many slices of beam particles can be received and stored in advance.

* ``hipace.comms_buffer_max_trailing_slices`` (`int`) optional (default `inf`)
* ``comms_buffer.max_trailing_slices`` (`int`) optional (default `inf`)
How many slices of beam particles can be stored before being sent. Using
``comms_buffer_max_leading_slices`` and ``comms_buffer_max_trailing_slices`` will in principle
``comms_buffer.max_leading_slices`` and ``comms_buffer.max_trailing_slices`` will in principle
limit the amount of asynchronousness in the parallel communication and may thus reduce performance.
However it may be necessary to set these parameters to avoid all slices accumulating on a single
rank that would run out of memory (out of CPU or GPU memory depending on ``hipace.comms_buffer_on_gpu``).
rank that would run out of memory (out of CPU or GPU memory depending on ``comms_buffer.on_gpu``).
If there are more time steps than ranks, these parameters must be chosen such that between all
ranks there is enough capacity to store every slice to avoid a deadlock, i.e.
``comms_buffer_max_trailing_slices * nranks > nslices``.
``comms_buffer.max_trailing_slices * nranks > nslices``.

* ``hipace.do_tiling`` (`bool`) optional (default `true`)
Whether to use tiling, when running on CPU.
Expand Down Expand Up @@ -1044,3 +1044,22 @@ Whether the energy loss due to classical radiation reaction of beam particles is
Whether the beam particles undergo energy loss due to classical radiation reaction.
The implemented radiation reaction model is based on this publication: `M. Tamburini et al., NJP 12, 123005 <https://doi.org/10.1088/1367-2630/12/12/123005>`__
In normalized units, `hipace.background_density_SI` must be specified.

Spin tracking
-------------

Track the spin of each beam particle as it is rotated by the electromagnetic fields using the
Thomas-Bargmann-Michel-Telegdi (TBMT) model, see
[Z. Gong et al., Matter and Radiation at Extremes 8.6 (2023), https://doi.org/10.1063/5.0152382]
for the details of the implementation.
This will add three extra components to each beam particle to store the spin and output
those as part of the beam diagnostic.

* ``<beam name> or beams.do_spin_tracking`` (`bool`) optional (default `0`)
Enable spin tracking

* ``<beam name> or beams.initial_spin`` (3 `float`)
Initial spin ``sx sy sz`` of all particles. The length of the three components is normalized to one.

* ``<beam name> or beams.spin_anom`` (`bool`) optional (default `0.00115965218128`)
The anomalous magnetic moment. The default value is the moment for electrons.
6 changes: 0 additions & 6 deletions src/Hipace.H
Original file line number Diff line number Diff line change
Expand Up @@ -266,12 +266,6 @@ public:
amrex::Parser m_salame_parser;
/** Function to get the target Ez field for SALAME */
amrex::ParserExecutor<3> m_salame_target_func;
/** Whether MPI communication buffers should be allocated in device memory */
bool m_comms_buffer_on_gpu = false;
/** How many slices of beam particles can be received in advance */
int m_comms_buffer_max_leading_slices = std::numeric_limits<int>::max();
/** How many slices of beam particles can be stored before being sent */
int m_comms_buffer_max_trailing_slices = std::numeric_limits<int>::max();

private:

Expand Down
20 changes: 6 additions & 14 deletions src/Hipace.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -140,19 +140,14 @@ Hipace::Hipace () :
#endif

queryWithParser(pph, "background_density_SI", m_background_density_SI);
queryWithParser(pph, "comms_buffer_on_gpu", m_comms_buffer_on_gpu);
queryWithParser(pph, "comms_buffer_max_leading_slices", m_comms_buffer_max_leading_slices);
queryWithParser(pph, "comms_buffer_max_trailing_slices", m_comms_buffer_max_trailing_slices);
DeprecatedInput("hipace", "comms_buffer_on_gpu", "comms_buffer.on_gpu", "", true);
DeprecatedInput("hipace", "comms_buffer_max_leading_slices",
"comms_buffer.max_leading_slices", "", true);
DeprecatedInput("hipace", "comms_buffer_max_trailing_slices",
"comms_buffer.max_trailing_slices)", "", true);

MakeGeometry();

AMREX_ALWAYS_ASSERT_WITH_MESSAGE(
((double(m_comms_buffer_max_trailing_slices)
* amrex::ParallelDescriptor::NProcs()) > m_3D_geom[0].Domain().length(2))
|| (m_max_step < amrex::ParallelDescriptor::NProcs()),
"comms_buffer_max_trailing_slices must be large enough"
" to distribute all slices between all ranks if there are more timesteps than ranks");

m_use_laser = m_multi_laser.m_use_laser;

queryWithParser(pph, "collisions", m_collision_names);
Expand Down Expand Up @@ -211,11 +206,8 @@ Hipace::InitData ()

m_multi_buffer.initialize(m_3D_geom[0].Domain().length(2),
m_multi_beam.get_nbeams(),
!m_comms_buffer_on_gpu,
m_use_laser,
m_use_laser ? m_multi_laser.getSlices()[0].box() : amrex::Box{},
m_comms_buffer_max_leading_slices,
m_comms_buffer_max_trailing_slices);
m_use_laser ? m_multi_laser.getSlices()[0].box() : amrex::Box{});

amrex::ParmParse pph("hipace");
bool do_output_input = false;
Expand Down
3 changes: 2 additions & 1 deletion src/diagnostics/OpenPMDWriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,8 @@ OpenPMDWriter::CopyBeams (MultiBeam& beams, const amrex::Vector< std::string > b
m_uint64_beam_data[ibeam][idx].get() + m_offset[ibeam]);
}

AMREX_ALWAYS_ASSERT_WITH_MESSAGE(m_real_beam_data[ibeam].size() == soa.NumRealComps(),
AMREX_ALWAYS_ASSERT_WITH_MESSAGE(
int(m_real_beam_data[ibeam].size()) == soa.NumRealComps(),
"List of real names in openPMD Writer class does not match the beam");

for (std::size_t idx=0; idx<m_real_beam_data[ibeam].size(); idx++) {
Expand Down
2 changes: 1 addition & 1 deletion src/laser/MultiLaser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ MultiLaser::ReadParameters ()
if (!m_laser_from_file) {
getWithParser(pp, "lambda0", m_lambda0);
}
DeprecatedInput("lasers", "3d_on_host", "hipace.comms_buffer_on_gpu", "", true);
DeprecatedInput("lasers", "3d_on_host", "comms_buffer.on_gpu", "", true);
queryWithParser(pp, "use_phase", m_use_phase);
queryWithParser(pp, "solver_type", m_solver_type);
AMREX_ALWAYS_ASSERT(m_solver_type == "multigrid" || m_solver_type == "fft");
Expand Down
18 changes: 15 additions & 3 deletions src/particles/beam/BeamParticleContainer.H
Original file line number Diff line number Diff line change
Expand Up @@ -191,12 +191,20 @@ public:
return m_total_num_particles;
}

int numRealComponents () const { return BeamIdx::real_nattribs; }
int numRealComponents () const {
return BeamIdx::real_nattribs + (m_do_spin_tracking ? 3 : 0);
}
int numIntComponents () const { return BeamIdx::int_nattribs; }

bool communicateIdCpuComponent () const { return true; }
bool communicateRealComponent (int rcomp) const { return rcomp < BeamIdx::real_nattribs_in_buffer; }
bool communicateIntComponent (int icomp) const { return icomp < BeamIdx::int_nattribs_in_buffer; }
bool communicateRealComponent (int rcomp) const {
// communicate all compile-time and runtime real components
return rcomp < numRealComponents();
}
bool communicateIntComponent (int icomp) const {
// don't communicate nsubcycles
return icomp < BeamIdx::int_nattribs_in_buffer;
}

private:
int m_slice_permutation = 0;
Expand Down Expand Up @@ -228,6 +236,10 @@ public:
amrex::Array<amrex::Parser, 6> m_external_fields_parser;
/** If spin tracking is enabled for this beam */
bool m_do_spin_tracking = false;
/** Initial spin of all particles */
amrex::RealVect m_initial_spin = {1, 0, 0,};
/** The anomalous magnetic moment */
amrex::Real m_spin_anom = 0.00115965218128;
private:
std::string m_name; /**< name of the species */
/** injection type, fixed_width or fixed_ppc */
Expand Down
23 changes: 23 additions & 0 deletions src/particles/beam/BeamParticleContainer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,15 @@ BeamParticleContainer::ReadParameters ()
soa.GetIntData()[icomp].setArena(
m_initialize_on_cpu ? amrex::The_Pinned_Arena() : amrex::The_Arena());
}
queryWithParserAlt(pp, "do_spin_tracking", m_do_spin_tracking, pp_alt);
if (m_do_spin_tracking) {
getWithParserAlt(pp, "initial_spin", m_initial_spin, pp_alt);
queryWithParserAlt(pp, "spin_anom", m_spin_anom, pp_alt);
for (auto& beam_tile : m_slices) {
// Use 3 real and 0 int runtime components
beam_tile.define(3, 0);
}
}
}

amrex::Real
Expand Down Expand Up @@ -383,6 +392,20 @@ BeamParticleContainer::intializeSlice (int slice, int which_slice) {
}
);
}

if (m_do_spin_tracking) {
auto ptd = getBeamSlice(which_slice).getParticleTileData();

const amrex::RealVect initial_spin_norm = m_initial_spin / m_initial_spin.vectorLength();

amrex::ParallelFor(getNumParticles(which_slice),
[=] AMREX_GPU_DEVICE (const int ip) {
ptd.m_runtime_rdata[0][ip] = initial_spin_norm[0];
ptd.m_runtime_rdata[1][ip] = initial_spin_norm[1];
ptd.m_runtime_rdata[2][ip] = initial_spin_norm[2];
}
);
}
}

void
Expand Down
37 changes: 37 additions & 0 deletions src/particles/pusher/BeamParticleAdvance.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ AdvanceBeamParticlesSlice (
const amrex::Real dt = Hipace::GetInstance().m_dt / n_subcycles;
const amrex::Real background_density_SI = Hipace::m_background_density_SI;
const bool normalized_units = Hipace::m_normalized_units;
const bool spin_tracking = beam.m_do_spin_tracking;
const amrex::Real spin_anom = beam.m_spin_anom;

if (normalized_units && radiation_reaction) {
AMREX_ALWAYS_ASSERT_WITH_MESSAGE(background_density_SI!=0,
Expand Down Expand Up @@ -140,6 +142,13 @@ AdvanceBeamParticlesSlice (

int i = ptd.idata(BeamIdx::nsubcycles)[ip];

amrex::RealVect spin {0._rt, 0._rt, 0._rt};
if (spin_tracking) {
spin[0] = ptd.m_runtime_rdata[0][ip];
spin[1] = ptd.m_runtime_rdata[1][ip];
spin[2] = ptd.m_runtime_rdata[2][ip];
}

for (; i < n_subcycles; i++) {

if (zp < min_z) {
Expand Down Expand Up @@ -214,6 +223,28 @@ AdvanceBeamParticlesSlice (
+ uy_intermediate*uy_intermediate
+ uz_intermediate*uz_intermediate )*inv_c2 );

if (spin_tracking) {
const amrex::RealVect E {ExmByp + clight*Byp, EypBxp - clight*Bxp, Ezp};
const amrex::RealVect B {Bxp, Byp, Bzp};
const amrex::RealVect u {ux_intermediate*inv_clight, uy_intermediate*inv_clight,
uz_intermediate*inv_clight};
const amrex::RealVect beta = u*gamma_intermediate_inv;
const amrex::Real gamma_inv_p1 =
gamma_intermediate_inv / (1._rt + gamma_intermediate_inv);

const amrex::RealVect omega = std::abs(charge_mass_ratio) * (
B * gamma_intermediate_inv - beta.crossProduct(E) * inv_clight * gamma_inv_p1
+ spin_anom * (
B - gamma_inv_p1 * u * beta.dotProduct(B) - beta.crossProduct(E) * inv_clight
)
);

const amrex::RealVect h = omega * dt * 0.5_rt;
const amrex::RealVect s_prime = spin + h.crossProduct(spin);
const amrex::Real o = 1._rt / (1._rt + h.dotProduct(h));
spin = o * (s_prime + (h.dotProduct(s_prime) * h + h.crossProduct(s_prime)));
}

amrex::ParticleReal uz_next = uz + dt * charge_mass_ratio
* ( Ezp + ( ux_intermediate * Byp - uy_intermediate * Bxp )
* gamma_intermediate_inv );
Expand Down Expand Up @@ -300,5 +331,11 @@ AdvanceBeamParticlesSlice (
ptd.rdata(BeamIdx::ux)[ip] = ux;
ptd.rdata(BeamIdx::uy)[ip] = uy;
ptd.rdata(BeamIdx::uz)[ip] = uz;

if (spin_tracking) {
ptd.m_runtime_rdata[0][ip] = spin[0];
ptd.m_runtime_rdata[1][ip] = spin[1];
ptd.m_runtime_rdata[2][ip] = spin[2];
}
});
}
8 changes: 5 additions & 3 deletions src/utils/MultiBuffer.H
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@ class MultiBuffer
public:

// initialize MultiBuffer and open initial receive requests
void initialize (int nslices, int nbeams, bool buffer_on_host, bool use_laser,
amrex::Box laser_box, int max_leading_slices, int max_trailing_slices);
void initialize (int nslices, int nbeams, bool use_laser, amrex::Box laser_box);

// receive data from previous rank and unpack it into MultiBeam and MultiLaser
void get_data (int slice, MultiBeam& beams, MultiLaser& laser, int beam_slice);
Expand Down Expand Up @@ -107,13 +106,16 @@ private:
MPI_Comm m_comm = MPI_COMM_NULL;

// general parameters
bool m_buffer_on_host = true;
/** Whether MPI communication buffers should be allocated in device memory */
bool m_buffer_on_gpu = false;
int m_nslices = 0;
int m_nbeams = 0;
bool m_use_laser = false;
int m_laser_ncomp = 4;
amrex::Box m_laser_slice_box {};
/** How many slices of beam particles can be received in advance */
int m_max_leading_slices = std::numeric_limits<int>::max();
/** How many slices of beam particles can be stored before being sent */
int m_max_trailing_slices = std::numeric_limits<int>::max();

// parameters to send physical time
Expand Down
21 changes: 14 additions & 7 deletions src/utils/MultiBuffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
#include "MultiBuffer.H"
#include "Hipace.H"
#include "HipaceProfilerWrapper.H"
#include "Parser.H"


std::size_t MultiBuffer::get_metadata_size () {
Expand All @@ -24,7 +25,7 @@ std::size_t* MultiBuffer::get_metadata_location (int slice) {

void MultiBuffer::allocate_buffer (int slice) {
AMREX_ALWAYS_ASSERT(m_datanodes[slice].m_location == memory_location::nowhere);
if (m_buffer_on_host) {
if (!m_buffer_on_gpu) {
m_datanodes[slice].m_buffer = reinterpret_cast<char*>(amrex::The_Pinned_Arena()->alloc(
m_datanodes[slice].m_buffer_size * sizeof(storage_type)
));
Expand All @@ -49,17 +50,16 @@ void MultiBuffer::free_buffer (int slice) {
m_datanodes[slice].m_buffer_size = 0;
}

void MultiBuffer::initialize (int nslices, int nbeams, bool buffer_on_host, bool use_laser,
amrex::Box laser_box, int max_leading_slices,
int max_trailing_slices) {
void MultiBuffer::initialize (int nslices, int nbeams, bool use_laser, amrex::Box laser_box) {

amrex::ParmParse pp("comms_buffer");

m_comm = amrex::ParallelDescriptor::Communicator();
const int rank_id = amrex::ParallelDescriptor::MyProc();
const int n_ranks = amrex::ParallelDescriptor::NProcs();

m_nslices = nslices;
m_nbeams = nbeams;
m_buffer_on_host = buffer_on_host;
m_use_laser = use_laser;
m_laser_slice_box = laser_box;

Expand All @@ -73,8 +73,15 @@ void MultiBuffer::initialize (int nslices, int nbeams, bool buffer_on_host, bool
m_tag_buffer_start = 1;
m_tag_metadata_start = m_tag_buffer_start + m_nslices;

m_max_leading_slices = max_leading_slices;
m_max_trailing_slices = max_trailing_slices;
queryWithParser(pp, "on_gpu", m_buffer_on_gpu);
queryWithParser(pp, "max_leading_slices", m_max_leading_slices);
queryWithParser(pp, "max_trailing_slices", m_max_trailing_slices);

AMREX_ALWAYS_ASSERT_WITH_MESSAGE(
((double(m_max_trailing_slices) * n_ranks) > nslices)
|| (Hipace::m_max_step < amrex::ParallelDescriptor::NProcs()),
"comms_buffer.max_trailing_slices must be large enough"
" to distribute all slices between all ranks if there are more timesteps than ranks");

for (int p = 0; p < comm_progress::nprogress; ++p) {
m_async_metadata_slice[p] = m_nslices - 1;
Expand Down

0 comments on commit afe711d

Please sign in to comment.