Skip to content

Latest commit

 

History

History
906 lines (714 loc) · 35.8 KB

notes.org

File metadata and controls

906 lines (714 loc) · 35.8 KB

IO and SimGrid

<2023-10-16 Mon>

Ok, let’s try to clean the mess of this repo.

The objective is to:

  • trace IOR in SMPI
  • replay the trace

We can compile IOR with SMPI

ior-simgrid = pkgs.ior.overrideAttrs (finalAttrs: previousAttrs: {
    pname = "ior-simgrid";
    propagatedBuildInputs = [ pkgs.simgrid ];
    configurePhase = ''
      ./bootstrap && SMPI_PRETEND_CC=1 ./configure --prefix=$out MPICC=${pkgs.simgrid}/bin/smpicc CC=${pkgs.simgrid}/bin/smpicc
    '';
});

We can run IOR in SMPI in the `expe` shell: `nix develop .#expe`, and then

smpirun -np 16 -platform nancy-2023-07-18.xml $(which ior) -f script.ior

I used the `which` trick because it seems like smpi could not find ior.

I added a shellHook to create a simlink for the ior bin.

So now we can run directly:

nix develop .#expe --command smpirun -np 16 -platform nancy-2023-07-18.xml ./ior.bin -f script.ior

you should get something like this:

[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/privatization' to '1'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/np' to '16'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/hostfile' to ''
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'surf/precision' to '1e-9'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'network/model' to 'SMPI'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/tmpdir' to '/tmp/nix-shell.KZo49G'
[0.000000] [smpi_config/INFO] You did not set the power of the host running the simulation.  The timings will certainly not be accurate.  Use the option "--cfg=smpi/host-speed:<flops>" to set its value.  Check https://simgrid.org/doc/latest/Configuring_SimGrid.html#automatic-benchmarking-of-smpi-code for more information.
Writing output to results_ior.json
Writing output to results_ior.json
Writing output to results_ior.json
Writing output to results_ior.json
Writing output to results_ior.json
Writing output to results_ior.json
Writing output to results_ior.json
Writing output to results_ior.json
Writing output to results_ior.json
Writing output to results_ior.json
Writing output to results_ior.json
Writing output to results_ior.json
Writing output to results_ior.json
Writing output to results_ior.json
Writing output to results_ior.json
Writing output to results_ior.json
ior ERROR: open64("testFile.00000000", 2) failed, errno 2, No such file or directory (/build/source/src/aiori-POSIX.c:473)
[graphite-1.grid5000.fr:0:(1) 30.000120] ../src/smpi/bindings/smpi_pmpi.cpp:138: [smpi_pmpi/WARNING] MPI_Abort was called, something went probably wrong in this simulation ! Killing all processes sharing the same MPI_COMM_WORLD
[30.000120] ../src/kernel/EngineImpl.cpp:718: [ker_engine/CRITICAL] Oops! Deadlock detected, some activities are still around but will never complete. This usually happens when the user code is not perfectly clean.
[30.000120] [ker_engine/INFO] 15 actors are still running, waiting for something.
[30.000120] [ker_engine/INFO] Legend of the following listing: "Actor <pid> (<name>@<host>): <status>"
[30.000120] [ker_engine/INFO] Actor 2 (1@graphite-2.grid5000.fr) simcall actor::ActivityWaitSimcall
[30.000120] [ker_engine/INFO] Actor 3 (2@graphite-3.grid5000.fr) simcall actor::ActivityWaitSimcall
[30.000120] [ker_engine/INFO] Actor 4 (3@graphite-4.grid5000.fr) simcall actor::ActivityWaitSimcall
[30.000120] [ker_engine/INFO] Actor 5 (4@grimoire-1.grid5000.fr) simcall actor::ActivityWaitSimcall
[30.000120] [ker_engine/INFO] Actor 6 (5@grimoire-2.grid5000.fr) simcall actor::ActivityWaitSimcall
[30.000120] [ker_engine/INFO] Actor 7 (6@grimoire-3.grid5000.fr) simcall actor::ActivityWaitSimcall
[30.000120] [ker_engine/INFO] Actor 8 (7@grimoire-4.grid5000.fr) simcall actor::ActivityWaitSimcall
[30.000120] [ker_engine/INFO] Actor 9 (8@grimoire-5.grid5000.fr) simcall actor::ActivityWaitSimcall
[30.000120] [ker_engine/INFO] Actor 10 (9@grimoire-6.grid5000.fr) simcall actor::ActivityWaitSimcall
[30.000120] [ker_engine/INFO] Actor 11 (10@grimoire-7.grid5000.fr) simcall actor::ActivityWaitSimcall
[30.000120] [ker_engine/INFO] Actor 12 (11@grimoire-8.grid5000.fr) simcall actor::ActivityWaitSimcall
[30.000120] [ker_engine/INFO] Actor 13 (12@grisou-1.grid5000.fr) simcall actor::ActivityWaitSimcall
[30.000120] [ker_engine/INFO] Actor 14 (13@grisou-10.grid5000.fr) simcall actor::ActivityWaitSimcall
[30.000120] [ker_engine/INFO] Actor 15 (14@grisou-11.grid5000.fr) simcall actor::ActivityWaitSimcall
[30.000120] [ker_engine/INFO] Actor 16 (15@grisou-12.grid5000.fr) simcall actor::ActivityWaitSimcall
[30.000120] [smpi/INFO] Stalling SMPI instance: smpirun. Do all your MPI ranks call MPI_Finalize()?
[30.000120] ../src/kernel/EngineImpl.cpp:286: [ker_engine/WARNING] Process called exit when leaving - Skipping cleanups
[30.000120] ../src/kernel/EngineImpl.cpp:286: [ker_engine/WARNING] Process called exit when leaving - Skipping cleanups

There are some issues with the lack of disk so the platform used

We provide a platform with disks `hosts_with_disks.xml` We will use MPIIO, and indicate the path where to write the files `/scratch`

the hostfile looks like:

bob
carl
bob
carl
nix develop .#expe --command smpirun -np 4 -hostfile hostfile_io -platform hosts_with_disks.xml ./ior.bin -a MPIIO -t 1m -b 16m -s 16 -F -o "/scratch/testFile" -V
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/privatization' to '1'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/np' to '4'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/hostfile' to 'hostfile_io'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'surf/precision' to '1e-9'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'network/model' to 'SMPI'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/tmpdir' to '/tmp/nix-shell.YYCfoq'
[0.000000] [smpi_config/INFO] You did not set the power of the host running the simulation.  The timings will certainly not be accurate.  Use
 the option "--cfg=smpi/host-speed:<flops>" to set its value.  Check https://simgrid.org/doc/latest/Configuring_SimGrid.html#automatic-benchm
arking-of-smpi-code for more information.
IOR-3.3.0: MPI Coordinated Test of Parallel I/O
Began               : Mon Oct 16 17:24:41 2023
Command line        : ./ior.bin -a MPIIO -t 1m -b 16m -s 16 -F -o /scratch/testFile -V
Machine             : Linux kagel
TestID              : 0
StartTime           : Mon Oct 16 17:24:41 2023
ior WARNING: unable to statfs() file system.

Options:
api                 : MPIIO
apiVersion          : (3.1)
test filename       : /scratch/testFile
access              : file-per-process
type                : independent
segments            : 16
ordering in a file  : sequential
ordering inter file : no tasks offsets
nodes               : 2
tasks               : 4
clients per node    : 2
repetitions         : 1
xfersize            : 1 MiB
blocksize           : 16 MiB
aggregate filesize  : 1 GiB

Results:

access    bw(MiB/s)  IOPS       Latency(s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ----       ----------  ---------- ---------  --------   --------   --------   --------   ----
write     36.17      36.17      1.44        16384      1024.00    0.000000   28.31      5.24       28.31      0
read      76.78      76.78      0.505938    16384      1024.00    0.000000   13.34      5.24       13.34      0
remove    -          -          -           -          -          -          -          -          0.000908   0
Max Write: 36.17 MiB/sec (37.93 MB/sec)
Max Read:  76.78 MiB/sec (80.51 MB/sec)

Summary of all tests:
Operation   Max(MiB)   Min(MiB)  Mean(MiB)     StdDev   Max(OPs)   Min(OPs)  Mean(OPs)     StdDev    Mean(s) Stonewall(s) Stonewall(MiB) Test
# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt   blksiz    xsize aggs(MiB)   API RefNum
write          36.17      36.17      36.17       0.00      36.17      36.17      36.17       0.00   28.30982         NA            NA     0
    4   2    1   1     0        1         0    0     16 16777216  1048576    1024.0 MPIIO      0
read           76.78      76.78      76.78       0.00      76.78      76.78      76.78       0.00   13.33615         NA            NA     0
    4   2    1   1     0        1         0    0     16 16777216  1048576    1024.0 MPIIO      0
Finished            : Mon Oct 16 17:24:41 2023

Ok nice!

there are some warning, but because the io operations managed by simgrid do not support statfs i think

Ok now let’s trace ior

for this we need to ask smpirun to have an time idenpendent trace (-trace-ti) and give where we want the result of the trace

nix develop .#expe --command smpirun -trace-ti --cfg=tracing/filename:ior-trace-smpi -np 4 -hostfile hostfile_io -platform hosts_with_disks.xml ./ior.bin -a MPIIO -t 1m -b 16m -s 16 -F -o "/scratch/testFile" -V

There should now be one `ior-trace-smpi` file and a folder named `ior-trace-smpi_files` with the traces per rank

the `ior-trace-smpi` simply contains the paths to the individual traces

If we open one of those trace, we can see the operation done and their quantity

For example the first 40 lines for rank 1:

0 init
0 compute 0.1304
0 allreduce 1 0 1 
0 compute 0.05926
0 compute 0.02646
0 allreduce 1 0 1 
0 compute 0.13312
0 bcast 4 0 6 
0 compute 0.04292
0 bcast 4096 0 2 
0 compute 0.02584
0 gather 1 1 0 1 1
0 bcast 1 0 1 
0 compute 0.04256
0 barrier
0 compute 0.02194
0 reduce 1 0 0 0 
0 reduce 1 0 0 0 
0 bcast 1 0 0 
0 bcast 1 0 1 
0 compute 2.09196
0 compute 0.13792
0 bcast 1 0 11 
0 compute 9.745
0 compute 0.0286
0 compute 0.02298
0 barrier
0 compute 0.02346
0 compute 0.05612
0 compute 0.24912
0 compute 0.06424
0 IO - write 1.04858e+06
0 IO - write 1.04858e+06
0 IO - write 1.04858e+06
0 IO - write 1.04858e+06
0 IO - write 1.04858e+06
0 IO - write 1.04858e+06
0 IO - write 1.04858e+06
0 IO - write 1.04858e+06
0 IO - write 1.04858e+06

Ok! very nice we just need to replay this in SMPI now!

so there is a IO replayer in the simgrid repo: https://framagit.org/simgrid/simgrid/-/tree/master/examples/cpp/replay-io

/* Copyright (c) 2017-2023. The SimGrid Team. All rights reserved.          */

/* This program is free software; you can redistribute it and/or modify it
 * under the terms of the license (GNU LGPL) which comes with this package. */

#include <simgrid/plugins/file_system.h>
#include <simgrid/s4u.hpp>
#include <xbt/replay.hpp>
#include <xbt/str.h>

#include <boost/algorithm/string/join.hpp>

XBT_LOG_NEW_DEFAULT_CATEGORY(replay_io, "Messages specific for this example");
namespace sg4 = simgrid::s4u;

#define ACT_DEBUG(...)                                                                                                 \
  if (XBT_LOG_ISENABLED(replay_io, xbt_log_priority_verbose)) {                                                        \
    std::string NAME = boost::algorithm::join(action, " ");                                                            \
    XBT_DEBUG(__VA_ARGS__);                                                                                            \
  } else                                                                                                               \
    ((void)0)

class Replayer {
  static std::unordered_map<std::string, sg4::File*> opened_files;

  static void log_action(const simgrid::xbt::ReplayAction& action, double date)
  {
    if (XBT_LOG_ISENABLED(replay_io, xbt_log_priority_verbose)) {
      std::string s = boost::algorithm::join(action, " ");
      XBT_VERB("%s %f", s.c_str(), date);
    }
  }

  static sg4::File* get_file_descriptor(const std::string& file_name)
  {
    std::string full_name = sg4::this_actor::get_name() + ":" + file_name;
    return opened_files.at(full_name);
  }

public:
  explicit Replayer(std::vector<std::string> args)
  {
    const char* actor_name = args[0].c_str();
    if (args.size() > 1) { // split mode, the trace file was provided in the deployment file
      const char* trace_filename = args[1].c_str();
      simgrid::xbt::replay_runner(actor_name, trace_filename);
    } else { // Merged mode
      simgrid::xbt::replay_runner(actor_name);
    }
  }

  void operator()() const
  {
    // Nothing to do here
  }

  /* My actions */
  static void open(simgrid::xbt::ReplayAction& action)
  {
    std::string file_name = action[2];
    double clock          = sg4::Engine::get_clock();
    std::string full_name = sg4::this_actor::get_name() + ":" + file_name;

    ACT_DEBUG("Entering Open: %s (filename: %s)", NAME.c_str(), file_name.c_str());
    auto* file = sg4::File::open(file_name, nullptr);
    opened_files.try_emplace(full_name, file);

    log_action(action, sg4::Engine::get_clock() - clock);
  }

  static void read(simgrid::xbt::ReplayAction& action)
  {
    std::string file_name = action[2];
    sg_size_t size        = std::stoul(action[3]);
    double clock          = sg4::Engine::get_clock();

    sg4::File* file = get_file_descriptor(file_name);

    ACT_DEBUG("Entering Read: %s (size: %llu)", NAME.c_str(), size);
    file->read(size);

    log_action(action, sg4::Engine::get_clock() - clock);
  }

  static void close(simgrid::xbt::ReplayAction& action)
  {
    std::string file_name = action[2];
    std::string full_name = sg4::this_actor::get_name() + ":" + file_name;
    double clock          = sg4::Engine::get_clock();

    ACT_DEBUG("Entering Close: %s (filename: %s)", NAME.c_str(), file_name.c_str());
    auto entry = opened_files.find(full_name);
    xbt_assert(entry != opened_files.end(), "File not found in opened files: %s", full_name.c_str());
    entry->second->close();
    opened_files.erase(entry);
    log_action(action, sg4::Engine::get_clock() - clock);
  }
};

std::unordered_map<std::string, sg4::File*> Replayer::opened_files;

int main(int argc, char* argv[])
{
  sg4::Engine e(&argc, argv);
  sg_storage_file_system_init();

  xbt_assert(argc > 3,
             "Usage: %s platform_file deployment_file [action_files]\n"
             "\texample: %s platform.xml deployment.xml actions # if all actions are in the same file\n"
             "\t# if actions are in separate files, specified in deployment\n"
             "\texample: %s platform.xml deployment.xml",
             argv[0], argv[0], argv[0]);

  e.load_platform(argv[1]);
  e.register_actor<Replayer>("p0");
  e.load_deployment(argv[2]);

  if (argv[3] != nullptr)
    xbt_replay_set_tracefile(argv[3]);

  /*   Action registration */
  xbt_replay_action_register("open", Replayer::open);
  xbt_replay_action_register("read", Replayer::read);
  xbt_replay_action_register("close", Replayer::close);

  e.run();

  XBT_INFO("Simulation time %g", sg4::Engine::get_clock());

  return 0;
}

for this replayer, there are 3 operations: open, read, and close So this is not enough to replay our trace yet

For now we will just try to print something when we detect a read or write operation in the trace:

/* Copyright (c) 2009-2023. The SimGrid Team. All rights reserved.          */

/* This program is free software; you can redistribute it and/or modify it
 * under the terms of the license (GNU LGPL) which comes with this package. */

#include "xbt/replay.hpp"
#include <simgrid/s4u/Actor.hpp>
#include "smpi/smpi.h"
#include "xbt/asserts.h"
#include "xbt/str.h"

#include "xbt/log.h"
XBT_LOG_NEW_DEFAULT_CATEGORY(replay_test, "Messages specific for this example");

/* This shows how to extend the trace format by adding a new kind of events.
   This function is registered through xbt_replay_action_register() below. */
static void action_io_write(const simgrid::xbt::ReplayAction& args)
{
  /* Add your answer to the blah event here.
     args is a strings array containing the blank-separated parameters found in the trace for this event instance. */
    XBT_INFO("io_write!");

}

static void action_io_read(const simgrid::xbt::ReplayAction& args)
{
  /* Add your answer to the blah event here.
     args is a strings array containing the blank-separated parameters found in the trace for this event instance. */
    XBT_INFO("io_read!");

}
int main(int argc, char* argv[])
{
  const auto* properties = simgrid::s4u::Actor::self()->get_properties();

  const char* instance_id = properties->at("instance_id").c_str();
  const int rank          = static_cast<int>(xbt_str_parse_int(properties->at("rank").c_str(), "Cannot parse rank"));
  const char* shared_trace =
      simgrid::s4u::Actor::self()->get_property("tracefile"); // Cannot use properties because this can be nullptr
  const char* private_trace  = argv[1];
  double start_delay_flops   = 0;

  if (argc > 2) {
    start_delay_flops = xbt_str_parse_double(argv[2], "Cannot parse start_delay_flops");
  }

  /* Setup things and register default actions */
  smpi_replay_init(instance_id, rank, start_delay_flops);

  /* Connect your callback function to the "blah" event in the trace files */
  xbt_replay_action_register("IO - write", action_io_write);
  xbt_replay_action_register("IO - read", action_io_read);

  /* The regular run of the replayer */
  if (shared_trace != nullptr)
    xbt_replay_set_tracefile(shared_trace);
  smpi_replay_main(rank, private_trace);
  return 0;
}

To compile, run:

nix develop .#dev --command smpicxx replay.cpp -o myreplay -std=c++17
nix develop .#dev --command smpirun -replay ior-trace-smpi -np 4 -platform hosts_with_disks.xml -hostfile hostfile_io myreplay
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/privatization' to '1'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/replay' to 'ior-trace-smpi'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/np' to '4'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/hostfile' to 'hostfile_io'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'surf/precision' to '1e-9'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'network/model' to 'SMPI'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/tmpdir' to '/tmp/nix-shell.3yPMvS'
[0.000000] [smpi_config/INFO] You did not set the power of the host running the simulation.  The timings will certainly not be accurate.  Use the option "--cfg=smpi/host-speed:<flops>" to set its value.  Check https://simgrid.org/doc/latest/Configuring_SimGrid.html#automatic-benchmarking-of-smpi-code for more information.
[bob:0:(1) 0.004911] ../src/xbt/xbt_replay.cpp:95: [root/CRITICAL] Replay Error: action IO is unknown, please register it properly in the replay engine
myreplay --cfg=smpi/privatization:1  --cfg=surf/precision:1e-9 --cfg=network/model:SMPI --cfg=smpi/tmpdir:/tmp/nix-shell.3yPMvS hosts_with_disks.xml
Execution failed with code 134.

Argh the issue seems to be the parsing of the io operation in the trace....

Let’s rename “IO - write” to “io-write”, and same for read

find ./ior-trace-smpi_files -type f -exec sed -i -e 's/IO - write/io-write/g' {} \;
find ./ior-trace-smpi_files -type f -exec sed -i -e 's/IO - read/io-read/g' {} \;
find ./replay.cpp -type f -exec sed -i -e 's/IO - write/io-write/g' {} \;
find ./replay.cpp -type f -exec sed -i -e 's/IO - read/io-read/g' {} \;

and recompile

nix develop .#dev --command smpicxx replay.cpp -o myreplay -std=c++17
nix develop .#dev --command smpirun -replay ior-trace-smpi -np 4 -platform hosts_with_disks.xml -hostfile hostfile_io myreplay

and we get the following:

[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/privatization' to '1'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/replay' to 'ior-trace-smpi'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/np' to '4'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/hostfile' to 'hostfile_io'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'surf/precision' to '1e-9'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'network/model' to 'SMPI'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/tmpdir' to '/tmp/nix-shell.MTIbcL'
[0.000000] [smpi_config/INFO] You did not set the power of the host running the simulation.  The timings will certainly not be accurate.  Use the option "--cfg=smpi/host-speed:<flops
>" to set its value.  Check https://simgrid.org/doc/latest/Configuring_SimGrid.html#automatic-benchmarking-of-smpi-code for more information.
[bob:0:(1) 0.004911] [replay_test/INFO] io_write!
[bob:0:(1) 0.004911] [replay_test/INFO] io_write!
[bob:0:(1) 0.004911] [replay_test/INFO] io_write!
[bob:0:(1) 0.004911] [replay_test/INFO] io_write!
[bob:0:(1) 0.004911] [replay_test/INFO] io_write!
[bob:0:(1) 0.004911] [replay_test/INFO] io_write!
[bob:0:(1) 0.004911] [replay_test/INFO] io_write!
[bob:0:(1) 0.004911] [replay_test/INFO] io_write!
[bob:0:(1) 0.004911] [replay_test/INFO] io_write!
[bob:0:(1) 0.004911] [replay_test/INFO] io_write!
[bob:0:(1) 0.004911] [replay_test/INFO] io_write!
[bob:0:(1) 0.004911] [replay_test/INFO] io_write!
[bob:0:(1) 0.004911] [replay_test/INFO] io_write!

Ok good! lets try to see what are the available informations

with a small change we can see that the available information are: `[bob:0:(1) 0.004911] [replay_test/INFO] 0 io-write 1.04858e+06`

so the rank, the operation, and the size

so, not awesome for us as it is as we would like to also know which file and other information.

but for now let’s continue and try to write the correct amount somewhere

the doc seems to be here: https://simgrid.org/doc/latest/app_s4u.html#i-o-operations

inspired by https://framagit.org/simgrid/simgrid/-/blob/master/examples/cpp/io-file-system/s4u-io-file-system.cpp, we get

static void action_io_write(const simgrid::xbt::ReplayAction& args)
{
  /* Add your answer to the blah event here.
     args is a strings array containing the blank-separated parameters found in the trace for this event instance. */
    XBT_INFO("io_write!");
    std::string filename     = "/scratch/testFile";
    auto* file               = simgrid::s4u::File::open(filename, nullptr);
    long int io_size = std::atof(args[2].c_str());
    sg_size_t write = file->write(io_size);
    XBT_INFO("Create a %llu bytes file named '%s' on /scratch : %ld", write, filename.c_str(), io_size);
    file->close();
}

static void action_io_read(const simgrid::xbt::ReplayAction& args)
{
  /* Add your answer to the blah event here.
     args is a strings array containing the blank-separated parameters found in the trace for this event instance. */
    XBT_INFO("io_read!");
    std::string filename     = "/scratch/testFile";
    auto* file               = simgrid::s4u::File::open(filename, nullptr);
    long int io_size = std::atof(args[2].c_str());
    sg_size_t read = file->read(io_size);
    XBT_INFO("Read a %llu bytes file named '%s' on /scratch : %ld", read, filename.c_str(), io_size);
    file->close();
}

But sometimes, we get some weird behavior:

[carl:1:(2) 6.169625] [replay_test/INFO] Create a 0 bytes file named '/scratch/testFile' on /scratch : 1048580
[carl:1:(2) 6.169625] [replay_test/INFO] io_write!
[carl:1:(2) 6.190098] [replay_test/INFO] Create a 0 bytes file named '/scratch/testFile' on /scratch : 1048580
[carl:3:(4) 6.190098] [replay_test/INFO] Create a 0 bytes file named '/scratch/testFile' on /scratch : 1048580

like it sees that it must write 1048580 bytes, but write 0

weird

let’s run only on `bob`

nix develop .#dev --command smpirun -replay ior-trace-smpi -np 4 -platform hosts_with_disks.xml -hostfile hostfile_bob myreplay

This is much better

ok so there is something to investigate with remote disks (carl mounts a disk from bob)

The tracing seems to be done here: https://framagit.org/simgrid/simgrid/-/blob/master/src/smpi/bindings/smpi_pmpi_file.cpp

so we can try to modify this to have at least the name of the file

It seems like it shouldn’t be too difficult as we can pass anything to the trace function:

src/instr/instr_private.hpp:  explicit CpuTIData(const std::string&); // disallow this constructor inherited from TIData

argh actually no, we’ll need to investigate more.

but tomorrow!

<2023-10-17 Tue>

ok

so the class for the smpi files are available here: src/smpi/include/smpi_file.hpp

I added the name of the file to the trace, and renamed the “IO - write” to “io-write” directly in simgrid (same for read)

diff --git a/src/smpi/bindings/smpi_pmpi_file.cpp b/src/smpi/bindings/smpi_pmpi_file.cpp
index e8dfd51da4..4a519d4795 100644
--- a/src/smpi/bindings/smpi_pmpi_file.cpp
+++ b/src/smpi/bindings/smpi_pmpi_file.cpp
@@ -99,7 +99,8 @@ int PMPI_File_read(MPI_File fh, void *buf, int count,MPI_Datatype datatype, MPI_
   PASS_ZEROCOUNT(count)
   const SmpiBenchGuard suspend_bench;
   aid_t rank_traced = simgrid::s4u::this_actor::get_pid();
-  TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData("IO - read", count * datatype->size()));
+  auto plop = "io-read " + std::string(fh->name());
+  TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData(plop, count * datatype->size()));
   int ret = simgrid::smpi::File::read(fh, buf, count, datatype, status);
   TRACE_smpi_comm_out(rank_traced);
   return ret;
@@ -124,7 +125,8 @@ int PMPI_File_write(MPI_File fh, const void *buf, int count,MPI_Datatype datatyp
   PASS_ZEROCOUNT(count)
   const SmpiBenchGuard suspend_bench;
   aid_t rank_traced = simgrid::s4u::this_actor::get_pid();
-  TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData("IO - write", count * datatype->size()));
+  auto plop = "io-write " + std::string(fh->name());
+  TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData(plop, count * datatype->size()));
   int ret = simgrid::smpi::File::write(fh, const_cast<void*>(buf), count, datatype, status);
   TRACE_smpi_comm_out(rank_traced);
   return ret;
diff --git a/src/smpi/include/smpi_file.hpp b/src/smpi/include/smpi_file.hpp
index 60ab2fde22..c74e3e0855 100644
--- a/src/smpi/include/smpi_file.hpp
+++ b/src/smpi/include/smpi_file.hpp
@@ -44,7 +44,8 @@ class File : public F2C{
   int flags() const;
   MPI_Datatype etype() const;
   MPI_Comm comm() const;
-  std::string name() const override {return file_ ? std::string("MPI_File: ")+ std::string(file_->get_path()): std::string("MPI_File");}
+  // std::string name() const override {return file_ ? std::string("MPI_File: ")+ std::string(file_->get_path()): std::string("MPI_File");}
+  std::string name() const override {return file_ ? std::string(file_->get_path()): std::string("MPI_File");}
 
   int sync();
   int seek(MPI_Offset offset, int whence);

Alright, we can now integrate the information into the replayer

static void action_io_write(const simgrid::xbt::ReplayAction& args)
{
    std::string filename     = args[2];
    auto* file               = simgrid::s4u::File::open(filename, nullptr);
    long int io_size = std::atof(args[3].c_str());
    sg_size_t write = file->write(io_size);
    XBT_INFO("Create a %llu bytes file named '%s' on /scratch : %ld", write, filename.c_str(), io_size);
    file->close();
}

static void action_io_read(const simgrid::xbt::ReplayAction& args)
{
    std::string filename     = args[2];
    auto* file               = simgrid::s4u::File::open(filename, nullptr);
    long int io_size = std::atof(args[3].c_str());
    sg_size_t read = file->read(io_size);
    XBT_INFO("Read a %llu bytes file named '%s' on /scratch : %ld", read, filename.c_str(), io_size);
    file->close();
}

nice, we can write to the correct files now!

Ok it would now be very nice if we can visualize the smpi run of ior and the replay of the trace.

we will follow: https://simgrid.org/contrib/R_visualization.html

We can generate the pajeng trace:

nix develop .#expe --command smpirun -trace --cfg=tracing/filename:ior-trace-vizu -np 4 -hostfile hostfile_bob -platform hosts_with_disks.xml ./ior.bin -a MPIIO -t 1m -b 16m -s 16 -F -o "/scratch/testFile" -V
# Read the data
library(tidyverse)
library(pajengr)
dta <- pajeng_read("ior-trace-vizu")

# Manipulate the data
dta$state %>%
   # Remove some unnecessary columns for this example
   select(-Type, -Imbrication) %>%
   # Create the nice MPI rank and operations identifiers
   mutate(Container = as.integer(gsub("rank-", "", Container)),
          Value = gsub("^PMPI_", "MPI_", Value)) %>%
   # Rename some columns so it can better fit MPI terminology
   rename(Rank = Container,
          Operation = Value) -> df.states

# Draw the Gantt Chart
df.states %>%
   ggplot() +
   # Each MPI operation is becoming a rectangle
   geom_rect(aes(xmin=Start, xmax=End,
                 ymin=Rank,  ymax=Rank + 1,
                 fill=Operation)) +
   # Cosmetics
   xlab("Time [seconds]") +
   ylab("Rank [count]") +
   theme_bw(base_size=14) +
   theme(
     plot.margin = unit(c(0,0,0,0), "cm"),
     legend.margin = margin(t = 0, unit='cm'),
     panel.grid = element_blank(),
     legend.position = "top",
     legend.justification = "left",
     legend.box.spacing = unit(0, "pt"),
     legend.box.margin = margin(0,0,0,0),
     legend.title = element_text(size=10)) -> plot

 # Save the plot in a PNG file (dimensions in inches)
 ggsave("ior-trace.png",
        plot,
        width = 10,
        height = 3)
nix develop .#rshell --command Rscript script.R

./ior-trace.png

Well ok, let’s now try with the replayer (i dont even know if it’s possible)

nix develop .#dev --command smpirun -np 4 -platform hosts_with_disks.xml -hostfile hostfile_bob -trace --cfg=tracing/filename:ior-trace-replay -replay ior-trace-smpi2 myreplay

Let’s change the Rscript to accept arguments

nix develop .#rshell --command Rscript script.R ior-trace-vizu ior-trace.png
nix develop .#rshell --command Rscript script.R ior-trace-replay ior-replay.png

./ior-trace.png ./ior-replay.png

hummm nothing on the replay argh

this is probably because we dont use MPI_File_read but read directly …

Ok, let’s try to call MPI function in the replayer

fuck i’ll write a quick makefile i had enough

Adrien had this for his expes

smpirun -replay trace_files.txt -np 16 -platform platform.xml -hostfile hostfile -trace -trace-file $@ --cfg=tracing/uncategorized:yes --cfg=host/model:ptask_L07

So, after some time fighting here is what i have done:

  • added tracing on MPI_File_Open and MPI_File_Close
  • in the replayer add a hashmap for remembering the open/closed files
  • in the replayer add support for open and close operation
diff --git a/src/smpi/bindings/smpi_pmpi_file.cpp b/src/smpi/bindings/smpi_pmpi_file.cpp
index e8dfd51da4..292e417d3b 100644
--- a/src/smpi/bindings/smpi_pmpi_file.cpp
+++ b/src/smpi/bindings/smpi_pmpi_file.cpp
@@ -43,13 +43,21 @@ int PMPI_File_open(MPI_Comm comm, const char *filename, int amode, MPI_Info info
   if (amode < 0)
     return MPI_ERR_AMODE;
   const SmpiBenchGuard suspend_bench;
+
+  aid_t rank_traced = simgrid::s4u::this_actor::get_pid();
+  auto plop = "io-open " + std::string(filename);
+  TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData(plop, 0));
   *fh =  new simgrid::smpi::File(comm, filename, amode, info);
+  TRACE_smpi_comm_out(rank_traced);
+
   if ((*fh)->size() != 0 && (amode & MPI_MODE_EXCL)){
     delete fh;
     return MPI_ERR_AMODE;
   }
   if(amode & MPI_MODE_APPEND)
     (*fh)->seek(0,MPI_SEEK_END);
   return MPI_SUCCESS;
 }
 
@@ -57,7 +65,11 @@ int PMPI_File_close(MPI_File *fh){
   CHECK_NULL(2, MPI_ERR_ARG, fh)
   CHECK_COLLECTIVE((*fh)->comm(), __func__)
   const SmpiBenchGuard suspend_bench;
+  aid_t rank_traced = simgrid::s4u::this_actor::get_pid();
+  auto plop = "io-close " + std::string((*fh)->name());
+  TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData(plop, 0));
   int ret = simgrid::smpi::File::close(fh);
+  TRACE_smpi_comm_out(rank_traced);
   *fh = MPI_FILE_NULL;
   return ret;
 }
@@ -99,7 +111,8 @@ int PMPI_File_read(MPI_File fh, void *buf, int count,MPI_Datatype datatype, MPI_
   PASS_ZEROCOUNT(count)
   const SmpiBenchGuard suspend_bench;
   aid_t rank_traced = simgrid::s4u::this_actor::get_pid();
-  TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData("IO - read", count * datatype->size()));
+  auto plop = "io-read " + std::string(fh->name());
+  TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData(plop, count * datatype->size()));
   int ret = simgrid::smpi::File::read(fh, buf, count, datatype, status);
   TRACE_smpi_comm_out(rank_traced);
   return ret;
@@ -124,7 +137,8 @@ int PMPI_File_write(MPI_File fh, const void *buf, int count,MPI_Datatype datatyp
   PASS_ZEROCOUNT(count)
   const SmpiBenchGuard suspend_bench;
   aid_t rank_traced = simgrid::s4u::this_actor::get_pid();
-  TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData("IO - write", count * datatype->size()));
+  auto plop = "io-write " + std::string(fh->name());
+  TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData(plop, count * datatype->size()));
   int ret = simgrid::smpi::File::write(fh, const_cast<void*>(buf), count, datatype, status);
   TRACE_smpi_comm_out(rank_traced);
   return ret;
diff --git a/src/smpi/include/smpi_file.hpp b/src/smpi/include/smpi_file.hpp
index 60ab2fde22..c74e3e0855 100644
--- a/src/smpi/include/smpi_file.hpp
+++ b/src/smpi/include/smpi_file.hpp
@@ -44,7 +44,8 @@ class File : public F2C{
   int flags() const;
   MPI_Datatype etype() const;
   MPI_Comm comm() const;
-  std::string name() const override {return file_ ? std::string("MPI_File: ")+ std::string(file_->get_path()): std::string("MPI_File");}
+  // std::string name() const override {return file_ ? std::string("MPI_File: ")+ std::string(file_->get_path()): std::string("MPI_File");}
+  std::string name() const override {return file_ ? std::string(file_->get_path()): std::string("MPI_File");}
 
   int sync();
   int seek(MPI_Offset offset, int whence);

So now we get the same simualtion time between the ior in smpi and the replayer!!!!

nix develop .#expe --command smpirun --cfg=smpi/display-timing:yes -np 4 -hostfile hostfile_bob -platform hosts_with_disks.xml ./ior.bin -a MPIIO -t 1m -b 16m -s 16 -F -o "/scratch/testFile" -V

[37.580965] [smpi_utils/INFO] Simulated time: 37.581 seconds
nix develop .#expe --command smpirun -trace-ti --cfg=tracing/filename:ior-trace-smpi_open -np 4 -hostfile hostfile_bob -platform hosts_with_disks.xml ./ior.bin -a MPIIO -t 1m -b 16m -s 16 -F -o "/scratch/testFile" -V
nix develop .#expe --command smpirun -np 4 -platform hosts_with_disks.xml -hostfile hostfile_bob --cfg=smpi/display-timing:yes -replay ior-trace-smpi_open myreplay

[37.581108] [smpi_utils/INFO] Simulated time: 37.5811 seconds.

Pretty good!

Let’s write some “doc”

ok let’s now try with a different platform.

we will have 4 nodes that all mount a disk on a remote host.

then in the hostfile we will have the 4 nodes

it is in scenario `remote-disk`

ok but now when i try with a single file in ior the replayer does not work.

I suppose that is just that we do not trace enough information to replay properly.

so we also need to trace:

  • PMPI_File_delete
  • PMPI_File_seek_*
  • PMPI_File_get_position_*

    Also, if we remove the `-V` flag of IOR (useFileView – use MPI_File_set_view), the replayer crashed because we use different finction (PMPI_File_read_at for example) this needs a tiny bit of work, but should be ok we just need to also pass the information about the offset

    We’ll see next time

<2023-10-19 Thu>

ok let’s implement all the operations remaining

for the file_read_at, the method is MPI_SEEK_SET

Ok implemented the replay to remove the `-V` of IOR

There are still some issue witht the single file…

It is not very clear what the issue is with a single file

there are probably some weird stuff going on with seeks

but otherwise, it seems to work good!