Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added non-blocking root communicator #1478

Open
wants to merge 21 commits into
base: develop
Choose a base branch
from

Conversation

gberg617
Copy link

Summary

This PR is a feature which adds a communicator for sending messages from any rank to the root rank non-collectively. This can be useful in cases where an arbitrary rank throws an error that needs to be sent to the root rank to output to a file.

@gberg617
Copy link
Author

Unit testing and documentation will be added to this PR in follow-up commits.

MPI_Status mpiStatus;

// Get size and source of MPI message
int mpiFlag = true;
Copy link
Member

@rhornung67 rhornung67 Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here and below. You could define static constexpr integer variables that use names containing true and false to make the code more readable and avoid magic numbers.


// Get size and source of MPI message
int mpiFlag = true;
MPI_Iprobe(MPI_ANY_SOURCE, tag, comm, &mpiFlag, &mpiStatus);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MPI_Iprobe is nonblocking here, so is there a chance the mpiFlag is not set to true when it is expected to be? Would it be better to have this be a blocking MPI_Probe? Basing this comment off this stackoverflow post: https://stackoverflow.com/questions/43823458/mpi-iprobe-vs-mpi-probe

Additionally, if using MPI_Iprobe, should mpiFlag default be set to false, so it can be set to true only by a successful function call?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the mpiFlag will be set in either context to either true or false, but to your point, it is safer to initialize this as false.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stackoverflow example illustrates an interesting but slightly different approach than what I'm intending to do. They are calling MPI_Iprobe in a while loop that does not exit until it returns a flag that is non-zero. In my case, I am checking to see if any messages need to be received only once, and if there are no messages, the function exits by returning nullptr. This intent in the stackoverflow example is to continuously monitor the status, whereas I'm only intending to periodically monitor the status whenever the code path enters into this function. Both could be relevant to the problem I'm trying to solve with this communicator, where the root rank needs to receive information from other ranks that they are aborting. I had a preference toward the latter option (periodically monitoring the status whenever the root rank reaches a point where it enters this code path) because it seemed to me like the more efficient option, even if it comes at a cost of sometimes not receiving the status before the program aborts. But I'm not really sure which option is best for this scenario. I'd be curious to hear your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a preference toward the latter option (periodically monitoring the status whenever the root rank reaches a point where it enters this code path) because it seemed to me like the more efficient option, even if it comes at a cost of sometimes not receiving the status before the program aborts.

I agree, I would expect the latter option to have less overhead, doing a single poll with MPI_Iprobe instead of spinning on MPI_Iprobe until status is updated in the former case. Nevertheless, I might not be considering something, so am also curious if others have ideas.

@gberg617 gberg617 force-pushed the feature/bergel1/lumberjack_nonblocking_communicator branch from 7921ec5 to 926fd00 Compare December 13, 2024 01:13
Copy link
Contributor

@bmhan12 bmhan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@gberg617 gberg617 force-pushed the feature/bergel1/lumberjack_nonblocking_communicator branch from 1b09d37 to 0c0da25 Compare January 17, 2025 18:44
@kennyweiss kennyweiss requested a review from gunney1 January 27, 2025 22:22
Copy link
Member

@white238 white238 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The axom team needs to discuss the implications of this further before this is merged.

@gberg617
Copy link
Author

@white238 @bmhan12 I talked with @gunney1 and we decided the best path forward is to duplicate the MPI communicator passed in the initialize() call, and have this duplicate owned by the Lumberjack communicator object. With this change, we can avoid having to create MPI tags for each non-collective communicator object, and instead have each communicator object have its own MPI communicator using the same default MPI tag. Please let me know if you have any further concerns.

Copy link
Member

@kennyweiss kennyweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gberg617

Sounds like we'll need to discuss some details before merging this. I've added a few minor comments in the meantime.

@@ -0,0 +1,135 @@
// Copyright (c) 2017-2024, Lawrence Livermore National Security, LLC and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After merging in develop, please run our update_copyright_script to update the copyright on the new files to 2025.
See: https://github.com/LLNL/axom/blob/develop/scripts/update_copyright_date.sh

Comment on lines +44 to +45
* \param [in] ranksLimit Limit on how many ranks are individually tracked per
* Message.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

Suggested change
* \param [in] ranksLimit Limit on how many ranks are individually tracked per
* Message.
* \param [in] ranksLimit Upper limit on number of ranks are tracked per Message.

MPI_Comm_dup(comm, &m_mpiComm);
MPI_Comm_rank(m_mpiComm, &m_mpiCommRank);
MPI_Comm_size(m_mpiComm, &m_mpiCommSize);
m_ranksLimit = ranksLimit;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably, this needs to be a positive number greater than or equal to 1. Should we check it w/ a SLIC_ASSERT?

* MPI communication uses default LJ_Tag.
*****************************************************************************
*/
const char* mpiNonBlockingReceiveMessages(MPI_Comm comm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the docs for this function (and mpiBlockingReceiveMessages) to indicate that the caller is responsible for deallocating the memory in the returned pointer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gberg617 We should update docs as @kennyweiss suggested.

/*!
*****************************************************************************
* \brief Receives any Message sent to this rank, if there are any messages
* that are sent. Returns null if no messages are sent.
Copy link
Contributor

@gunney1 gunney1 Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we return from a non-blocking probe, there could be sent messages that haven't arrived yet. I suggest replacing "are sent" with "have arrived." It's pedantic but can avoid confusion when the unexpected happens.

{
currPackedMessages = mpiNonBlockingReceiveMessages(m_mpiComm);

if(isPackedMessagesEmpty(currPackedMessages))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks odd to me that currPackedMessages is used before checking if it's null.

@gberg617
Copy link
Author

Update on the failing tests: The multiple_communicators test I added sporadically fails on Azure. This test is important to capture certain behavior when multiple communicators are sending/receiving messages. Looking into this, it sometimes fails when run with other unit tests (and only on Azure), but consistently passes when it is run by itself. I have heard that gtest+MPI is somewhat fragile in some cases, so I attempted to remove gtest from the tests I added by converting the ASSERT macros to my own version, and having each test become a function that is called within main() inside of lumberjack_NonCollectiveRootCommunicator.cpp. This fixes the issue, and both tests in this file always pass. Does the Axom team have any thoughts about potentially adding unit tests that do not use gtest? If there is a strong preference for requiring gtest, does anyone have thoughts on other solutions that can prevent these sporadic failures?

@bmhan12
Copy link
Contributor

bmhan12 commented Feb 12, 2025

Update on the failing tests...

For reference, this is the error that is seen on Azure Pipelines and when ran locally in Docker:

102: Test command: /usr/bin/mpirun "-np" "7" "/home/axom/axom/build-clang@14.0.0-release/tests/lumberjack_mpi_tests" "--gtest_filter=lumberjack_NonCollectiveRootCommunicator*"
...
...
...
102: /home/axom/axom/src/axom/lumberjack/tests/lumberjack_NonCollectiveRootCommunicator.hpp:121: Failure
102: Expected equality of these values:
102:   receivedPackedMessages_c1.size()
102:     Which is: 0
102:   1
102: 
102: 
102: lumberjack_mpi_tests:8831 terminated with signal 11 at PC=55a67bf82d09 SP=7ffe05eea260.  Backtrace:
102: /home/axom/axom/build-clang@14.0.0-release/tests/lumberjack_mpi_tests(+0x8d09)[0x55a67bf82d09]
102: /home/axom/axom/build-clang@14.0.0-release/lib/libgtest.so.1.13.0(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x48)[0x7f8043294ef8]
102: /home/axom/axom/build-clang@14.0.0-release/lib/libgtest.so.1.13.0(_ZN7testing4Test3RunEv+0xcf)[0x7f80432790ef]
102: /home/axom/axom/build-clang@14.0.0-release/lib/libgtest.so.1.13.0(_ZN7testing8TestInfo3RunEv+0xf9)[0x7f804327a3d9]
102: /home/axom/axom/build-clang@14.0.0-release/lib/libgtest.so.1.13.0(_ZN7testing9TestSuite3RunEv+0x285)[0x7f804327aeb5]
102: /home/axom/axom/build-clang@14.0.0-release/lib/libgtest.so.1.13.0(_ZN7testing8internal12UnitTestImpl11RunAllTestsEv+0x41c)[0x7f804328b81c]
102: /home/axom/axom/build-clang@14.0.0-release/lib/libgtest.so.1.13.0(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc+0x48)[0x7f8043295c18]
102: /home/axom/axom/build-clang@14.0.0-release/lib/libgtest.so.1.13.0(_ZN7testing8UnitTest3RunEv+0x59)[0x7f804328b3c9]
102: /home/axom/axom/build-clang@14.0.0-release/tests/lumberjack_mpi_tests(main+0x35)[0x55a67bf83075]
102: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f804015dd90]
102: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f804015de40]
102: /home/axom/axom/build-clang@14.0.0-release/tests/lumberjack_mpi_tests(+0x4f15)[0x55a67bf7ef15]
102: 
102: ===================================================================================
102: =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
102: =   PID 8832 RUNNING AT e17873b86e18
102: =   EXIT CODE: 9
102: =   CLEANING UP REMAINING PROCESSES
102: =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
102: ===================================================================================
102: YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
102: This typically refers to a problem with your application.
102: Please see the FAQ page for debugging suggestions

gberg617 and others added 2 commits February 19, 2025 10:52
…ator.hpp


attempting to fix sporadic failures on Azure.

Co-authored-by: Brian Han <han12@llnl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants