Skip to content

Refactored mpirun to not exec a second executable #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: capstone-devel
Choose a base branch
from

Conversation

gagarc
Copy link

@gagarc gagarc commented Nov 12, 2024

Modified mpirun to no longer fork/exec prterun, but to instead instead call the function directly

@gagarc gagarc requested review from jsquyres and hppritcha November 12, 2024 00:08
Copy link

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good!

@gagarc gagarc force-pushed the gerardo-refactor-mpirun branch from 804d3c6 to 34fe233 Compare November 13, 2024 04:53
Copy link

@hppritcha hppritcha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good, but we'll need to get the changes in prrte for adding prte_launch to the prte.h include file.

We will need for uofl-capstone-open-mpi/prrte#6 to be merged in then move the prrte sha in the capstone ompi repo forward to pick that change up before CI can pass for this PR.

@gagarc gagarc marked this pull request as ready for review November 21, 2024 14:28
@jsquyres
Copy link

@gagarc I think you need to rebase to the tip of capstone-devel. Github is saying you have conflicts with the submodule update.

You may find it easier to drop your prrte git submodule update, then rebase, then re-create your prrte git submodule update.

@gagarc gagarc force-pushed the gerardo-refactor-mpirun branch from 6d405ce to 9499b9d Compare November 30, 2024 02:02
@jsquyres
Copy link

@hppritcha The mpi4py CI failure here is legit. It stems from the OMPI DPM looking for the prte executable and (correctly) not finding it. See https://github.com/uofl-capstone-open-mpi/ompi/blob/capstone-devel/ompi/dpm/dpm.c#L1978-L2016. I see that dpm.c:find_prte() is invoked from dpm.c:start_dvm() (i.e., https://github.com/uofl-capstone-open-mpi/ompi/blob/capstone-devel/ompi/dpm/dpm.c#L2047).

Do we need a PRTE library way to start a DVM?

@hppritcha
Copy link

oopss it looks like a change on the prrte side caused the prte command to vanish. we need that.

@hppritcha hppritcha force-pushed the gerardo-refactor-mpirun branch from 9499b9d to 9437036 Compare December 6, 2024 00:11
@hppritcha
Copy link

okay now just getting known comm create from group thing in mpi4py. rerunning.

…he function directly in the process

Note this commit will need to be adjusted to reset .gitmodules before merging into Open MPI main

Signed-off-by: Gerardo Garcia <gagarc07@louisville.edu>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@hppritcha hppritcha force-pushed the gerardo-refactor-mpirun branch from 79a0da3 to 9b7b0a4 Compare December 7, 2024 14:54
@hppritcha hppritcha self-requested a review December 7, 2024 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants