Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added non-blocking root communicator #1478

Open
wants to merge 25 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
282ae1c
added non-blocking root communicator
gberg617 Dec 11, 2024
43b02da
fixed tag and flag comparison type, renamed NonBlockingRootCommunicat…
gberg617 Dec 13, 2024
7969aa8
small formatting fix
gberg617 Dec 13, 2024
14f802e
added unique mpi tag for each NonCollectiveRootCommunicator instance,…
gberg617 Dec 13, 2024
c04745a
added missing function param description
gberg617 Dec 16, 2024
8275b6d
applied clang-format
gberg617 Jan 16, 2025
0c0da25
fixed noncollective_communication test
gberg617 Jan 16, 2025
1c120d3
small fix for delete
gberg617 Jan 22, 2025
c74d1f4
Update src/axom/lumberjack/MPIUtility.hpp
gberg617 Jan 23, 2025
53c0f4e
Update src/axom/lumberjack/NonCollectiveRootCommunicator.hpp
gberg617 Jan 23, 2025
173bb41
Update src/axom/lumberjack/tests/lumberjack_NonCollectiveRootCommunic…
gberg617 Jan 23, 2025
d946bf9
added comment and fixed some formatting
gberg617 Jan 23, 2025
3a6c50a
add numeric_limits check for mpiTag
gberg617 Jan 23, 2025
d3bf633
re-run CI commit
gberg617 Jan 23, 2025
ffb6d91
removed mpi tags for NonCollectiveRootCommunicator and added MPI comm…
gberg617 Jan 29, 2025
3a3a52c
apply clang-format
gberg617 Jan 29, 2025
5f416fd
removed unnecessary includes that my fancy IDE decided to add
gberg617 Jan 29, 2025
ae83780
re-trigger tests
gberg617 Jan 29, 2025
fe58609
Merge branch 'develop' into feature/bergel1/lumberjack_nonblocking_co…
rhornung67 Jan 29, 2025
a6b8c63
Update src/axom/lumberjack/tests/lumberjack_NonCollectiveRootCommunic…
gberg617 Feb 19, 2025
92fd4c8
re-trigger tests
gberg617 Feb 20, 2025
97579e5
Merge branch 'develop' into feature/bergel1/lumberjack_nonblocking_co…
gberg617 Feb 20, 2025
a0f3548
Merge branch 'develop' into feature/bergel1/lumberjack_nonblocking_co…
gberg617 Feb 21, 2025
a3863ea
various small fixes
gberg617 Feb 21, 2025
4a62e2b
more small fixes
gberg617 Feb 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/axom/lumberjack/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ set(lumberjack_headers
Message.hpp
MPIUtility.hpp
RootCommunicator.hpp
NonCollectiveRootCommunicator.hpp
TextEqualityCombiner.hpp
TextTagCombiner.hpp
)
Expand All @@ -27,6 +28,7 @@ set(lumberjack_sources
Message.cpp
MPIUtility.cpp
RootCommunicator.cpp
NonCollectiveRootCommunicator.cpp
)


Expand Down
32 changes: 31 additions & 1 deletion src/axom/lumberjack/MPIUtility.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
*/

#include "axom/lumberjack/MPIUtility.hpp"

#include <cstring>

namespace axom
Expand Down Expand Up @@ -49,6 +48,37 @@ const char* mpiBlockingReceiveMessages(MPI_Comm comm)
return charArray;
}

const char* mpiNonBlockingReceiveMessages(MPI_Comm comm)
{
char* charArray = nullptr;
int messageSize = -1;
MPI_Status mpiStatus;

// Get size and source of MPI message
int mpiFlag = 0;
MPI_Iprobe(MPI_ANY_SOURCE, LJ_TAG, comm, &mpiFlag, &mpiStatus);

if(mpiFlag == 1)
{
MPI_Get_count(&mpiStatus, MPI_CHAR, &messageSize);

// Setup where to receive the char array
charArray = new char[messageSize + 1];
charArray[messageSize] = '\0';

// Receive packed Message
MPI_Recv(charArray,
messageSize,
MPI_CHAR,
mpiStatus.MPI_SOURCE,
LJ_TAG,
comm,
&mpiStatus);
}

return charArray;
}

void mpiNonBlockingSendMessages(MPI_Comm comm,
int destinationRank,
const char* packedMessagesToBeSent)
Expand Down
14 changes: 14 additions & 0 deletions src/axom/lumberjack/MPIUtility.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,18 @@ namespace lumberjack
*/
const char* mpiBlockingReceiveMessages(MPI_Comm comm);

/*!
*****************************************************************************
* \brief Receives any Message sent to this rank, if there are any messages
* that are sent. Returns null if no messages are sent.
Copy link
Contributor

@gunney1 gunney1 Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we return from a non-blocking probe, there could be sent messages that haven't arrived yet. I suggest replacing "are sent" with "have arrived." It's pedantic but can avoid confusion when the unexpected happens.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

*
* \param [in] comm The MPI Communicator.
* \param [in] tag The MPI tag to use for communication. When set to zero,
* MPI communication uses default LJ_Tag.
*****************************************************************************
*/
const char* mpiNonBlockingReceiveMessages(MPI_Comm comm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the docs for this function (and mpiBlockingReceiveMessages) to indicate that the caller is responsible for deallocating the memory in the returned pointer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gberg617 We should update docs as @kennyweiss suggested.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a sentence about this to the function documentation,


/*!
*****************************************************************************
* \brief Sends all Message sent to the given rank.
Expand All @@ -40,6 +52,8 @@ const char* mpiBlockingReceiveMessages(MPI_Comm comm);
* \param [in] destinationRank Where the Message classes is being sent.
* \param [in,out] packedMessagesToBeSent All of the Message classes to be sent
* packed together.
* \param [in] tag The MPI tag to use for communication. When set to zero,
* MPI communication uses the default Lumberjack Tag.
*****************************************************************************
*/
void mpiNonBlockingSendMessages(MPI_Comm comm,
Expand Down
96 changes: 96 additions & 0 deletions src/axom/lumberjack/NonCollectiveRootCommunicator.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
// Copyright (c) 2017-2024, Lawrence Livermore National Security, LLC and
// other Axom Project Developers. See the top-level LICENSE file for details.
//
// SPDX-License-Identifier: (BSD-3-Clause)

/*!
******************************************************************************
*
* \file NonCollectiveRootCommunicator.cpp
*
* \brief Implementation of the NonCollectiveRootCommunicator class.
*
******************************************************************************
*/

#include <limits>

#include "axom/lumberjack/NonCollectiveRootCommunicator.hpp"
#include "axom/lumberjack/MPIUtility.hpp"

namespace axom
{
namespace lumberjack
{
void NonCollectiveRootCommunicator::initialize(MPI_Comm comm, int ranksLimit)
{
MPI_Comm_dup(comm, &m_mpiComm);
MPI_Comm_rank(m_mpiComm, &m_mpiCommRank);
MPI_Comm_size(m_mpiComm, &m_mpiCommSize);
m_ranksLimit = ranksLimit;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably, this needs to be a positive number greater than or equal to 1. Should we check it w/ a SLIC_ASSERT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put in a std::cerr to avoid including anything from SLIC in Lumberjack.

}

void NonCollectiveRootCommunicator::finalize() { MPI_Comm_free(&m_mpiComm); }

int NonCollectiveRootCommunicator::rank() { return m_mpiCommRank; }

void NonCollectiveRootCommunicator::ranksLimit(int value)
{
m_ranksLimit = value;
}

int NonCollectiveRootCommunicator::ranksLimit() { return m_ranksLimit; }

int NonCollectiveRootCommunicator::numPushesToFlush() { return 1; }

void NonCollectiveRootCommunicator::push(
const char* packedMessagesToBeSent,
std::vector<const char*>& receivedPackedMessages)
{
if(m_mpiCommRank == 0)
{
const char* currPackedMessages = nullptr;
bool receive_messages = true;
while(receive_messages)
{
currPackedMessages = mpiNonBlockingReceiveMessages(m_mpiComm);

if(isPackedMessagesEmpty(currPackedMessages))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks odd to me that currPackedMessages is used before checking if it's null.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe isPackedMessagesEmpty() will check to see if currPackedMessages is null.

{
if(currPackedMessages == nullptr)
{
receive_messages = false;
}
else
{
delete[] currPackedMessages;
}
}
else
{
receivedPackedMessages.push_back(currPackedMessages);
}

currPackedMessages = nullptr;
}
}
else
{
if(isPackedMessagesEmpty(packedMessagesToBeSent) == false)
{
mpiNonBlockingSendMessages(m_mpiComm, 0, packedMessagesToBeSent);
}
}
}

bool NonCollectiveRootCommunicator::isOutputNode()
{
if(m_mpiCommRank == 0)
{
return true;
}
return false;
}

} // end namespace lumberjack
} // end namespace axom
135 changes: 135 additions & 0 deletions src/axom/lumberjack/NonCollectiveRootCommunicator.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
// Copyright (c) 2017-2024, Lawrence Livermore National Security, LLC and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After merging in develop, please run our update_copyright_script to update the copyright on the new files to 2025.
See: https://github.com/LLNL/axom/blob/develop/scripts/update_copyright_date.sh

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// other Axom Project Developers. See the top-level LICENSE file for details.
//
// SPDX-License-Identifier: (BSD-3-Clause)

/*!
*******************************************************************************
* \file NonCollectiveRootCommunicator.hpp
*
* \brief This file contains the class definition of the
* NonCollectiveRootCommunicator.
*******************************************************************************
*/

#ifndef NONCOLLECTIVEROOTCOMMUNICATOR_HPP
#define NONCOLLECTIVEROOTCOMMUNICATOR_HPP

#include "axom/lumberjack/Lumberjack.hpp"
#include "axom/lumberjack/Communicator.hpp"

namespace axom
{
namespace lumberjack
{
/*!
*******************************************************************************
* \class NonCollectiveRootCommunicator
*
* \brief Based off of RootCommunicator. This communicator pushes
messages from any rank to root non-collectively, if any messages are sent.
*******************************************************************************
*/
class NonCollectiveRootCommunicator : public axom::lumberjack::Communicator
{
public:
/*!
*****************************************************************************
* \brief Called to initialize the Communicator.
*
* This performs any setup work the Communicator needs before doing any work.
* It is required that this is called before using the Communicator.
*
* \param [in] comm The MPI Communicator
* \param [in] ranksLimit Limit on how many ranks are individually tracked per
* Message.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

Suggested change
* \param [in] ranksLimit Limit on how many ranks are individually tracked per
* Message.
* \param [in] ranksLimit Upper limit on number of ranks are tracked per Message.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

*****************************************************************************
*/
void initialize(MPI_Comm comm, int ranksLimit);

/*!
*****************************************************************************
* \brief Called to finalize the Communicator.
*
* This performs any cleanup work the Communicator needs to do before going
* away.It is required that this is the last function called by the
* Communicator.
*****************************************************************************
*/
void finalize();

/*!
*****************************************************************************
* \brief Returns the MPI rank of this node
*****************************************************************************
*/
int rank();

/*!
*****************************************************************************
* \brief Sets the rank limit.
*
* This is the limit on how many ranks generated a given message are
* individually tracked per Message. After the limit has been reached, only
* the Message::rankCount is incremented.
*
* \param [in] value Limits how many ranks are tracked per Message.
*****************************************************************************
*/
void ranksLimit(int value);

/*!
*****************************************************************************
* \brief Returns the rank limit.
*
* This is the limit on how many ranks generated a given message are
* individually tracked per Message. After the limit has been reached, only
* the Message::rankCount is incremented.
*****************************************************************************
*/
int ranksLimit();

/*!
*****************************************************************************
* \brief Function used by the Lumberjack class to indicate how many
* individual pushes fully flush all currently held Message classes to the
* root node. The Communicator class's tree structure dictates this.
*****************************************************************************
*/
int numPushesToFlush();

/*!
*****************************************************************************
* \brief This pushes all messages to the root node.
*
* All messages are pushed to the root node. This is the same as
* RootCommunicator::pushMessagesFully for this Communicator.
*
* \param [in] packedMessagesToBeSent All of this rank's Message classes
* packed into a single buffer.
* \param [in,out] receivedPackedMessages Received packed message buffers from
* this nodes children.
*****************************************************************************
*/
void push(const char* packedMessagesToBeSent,
std::vector<const char*>& receivedPackedMessages);

/*!
*****************************************************************************
* \brief Function used by the Lumberjack to indicate whether this node should
* be outputting messages. Only the root node outputs messages.
*****************************************************************************
*/
bool isOutputNode();

private:
MPI_Comm m_mpiComm;
int m_mpiCommRank;
int m_mpiCommSize;
int m_ranksLimit;
};

} // end namespace lumberjack
} // end namespace axom

#endif
3 changes: 2 additions & 1 deletion src/axom/lumberjack/tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@ endforeach()
#------------------------------------------------------------------------------
set(lumberjack_mpi_tests
lumberjack_BinaryCommunicator.hpp
lumberjack_RootCommunicator.hpp )
lumberjack_RootCommunicator.hpp
lumberjack_NonCollectiveRootCommunicator.hpp )

axom_add_executable(NAME lumberjack_mpi_tests
SOURCES lumberjack_mpi_main.cpp
Expand Down
Loading
Loading