Ok, let’s try to clean the mess of this repo.
The objective is to:
- trace IOR in SMPI
- replay the trace
We can compile IOR with SMPI
ior-simgrid = pkgs.ior.overrideAttrs (finalAttrs: previousAttrs: { pname = "ior-simgrid"; propagatedBuildInputs = [ pkgs.simgrid ]; configurePhase = '' ./bootstrap && SMPI_PRETEND_CC=1 ./configure --prefix=$out MPICC=${pkgs.simgrid}/bin/smpicc CC=${pkgs.simgrid}/bin/smpicc ''; });
We can run IOR in SMPI in the `expe` shell: `nix develop .#expe`, and then
smpirun -np 16 -platform nancy-2023-07-18.xml $(which ior) -f script.ior
I used the `which` trick because it seems like smpi could not find ior.
I added a shellHook to create a simlink for the ior bin.
So now we can run directly:
nix develop .#expe --command smpirun -np 16 -platform nancy-2023-07-18.xml ./ior.bin -f script.ior
you should get something like this:
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/privatization' to '1' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/np' to '16' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/hostfile' to '' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'surf/precision' to '1e-9' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'network/model' to 'SMPI' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/tmpdir' to '/tmp/nix-shell.KZo49G' [0.000000] [smpi_config/INFO] You did not set the power of the host running the simulation. The timings will certainly not be accurate. Use the option "--cfg=smpi/host-speed:<flops>" to set its value. Check https://simgrid.org/doc/latest/Configuring_SimGrid.html#automatic-benchmarking-of-smpi-code for more information. Writing output to results_ior.json Writing output to results_ior.json Writing output to results_ior.json Writing output to results_ior.json Writing output to results_ior.json Writing output to results_ior.json Writing output to results_ior.json Writing output to results_ior.json Writing output to results_ior.json Writing output to results_ior.json Writing output to results_ior.json Writing output to results_ior.json Writing output to results_ior.json Writing output to results_ior.json Writing output to results_ior.json Writing output to results_ior.json ior ERROR: open64("testFile.00000000", 2) failed, errno 2, No such file or directory (/build/source/src/aiori-POSIX.c:473) [graphite-1.grid5000.fr:0:(1) 30.000120] ../src/smpi/bindings/smpi_pmpi.cpp:138: [smpi_pmpi/WARNING] MPI_Abort was called, something went probably wrong in this simulation ! Killing all processes sharing the same MPI_COMM_WORLD [30.000120] ../src/kernel/EngineImpl.cpp:718: [ker_engine/CRITICAL] Oops! Deadlock detected, some activities are still around but will never complete. This usually happens when the user code is not perfectly clean. [30.000120] [ker_engine/INFO] 15 actors are still running, waiting for something. [30.000120] [ker_engine/INFO] Legend of the following listing: "Actor <pid> (<name>@<host>): <status>" [30.000120] [ker_engine/INFO] Actor 2 (1@graphite-2.grid5000.fr) simcall actor::ActivityWaitSimcall [30.000120] [ker_engine/INFO] Actor 3 (2@graphite-3.grid5000.fr) simcall actor::ActivityWaitSimcall [30.000120] [ker_engine/INFO] Actor 4 (3@graphite-4.grid5000.fr) simcall actor::ActivityWaitSimcall [30.000120] [ker_engine/INFO] Actor 5 (4@grimoire-1.grid5000.fr) simcall actor::ActivityWaitSimcall [30.000120] [ker_engine/INFO] Actor 6 (5@grimoire-2.grid5000.fr) simcall actor::ActivityWaitSimcall [30.000120] [ker_engine/INFO] Actor 7 (6@grimoire-3.grid5000.fr) simcall actor::ActivityWaitSimcall [30.000120] [ker_engine/INFO] Actor 8 (7@grimoire-4.grid5000.fr) simcall actor::ActivityWaitSimcall [30.000120] [ker_engine/INFO] Actor 9 (8@grimoire-5.grid5000.fr) simcall actor::ActivityWaitSimcall [30.000120] [ker_engine/INFO] Actor 10 (9@grimoire-6.grid5000.fr) simcall actor::ActivityWaitSimcall [30.000120] [ker_engine/INFO] Actor 11 (10@grimoire-7.grid5000.fr) simcall actor::ActivityWaitSimcall [30.000120] [ker_engine/INFO] Actor 12 (11@grimoire-8.grid5000.fr) simcall actor::ActivityWaitSimcall [30.000120] [ker_engine/INFO] Actor 13 (12@grisou-1.grid5000.fr) simcall actor::ActivityWaitSimcall [30.000120] [ker_engine/INFO] Actor 14 (13@grisou-10.grid5000.fr) simcall actor::ActivityWaitSimcall [30.000120] [ker_engine/INFO] Actor 15 (14@grisou-11.grid5000.fr) simcall actor::ActivityWaitSimcall [30.000120] [ker_engine/INFO] Actor 16 (15@grisou-12.grid5000.fr) simcall actor::ActivityWaitSimcall [30.000120] [smpi/INFO] Stalling SMPI instance: smpirun. Do all your MPI ranks call MPI_Finalize()? [30.000120] ../src/kernel/EngineImpl.cpp:286: [ker_engine/WARNING] Process called exit when leaving - Skipping cleanups [30.000120] ../src/kernel/EngineImpl.cpp:286: [ker_engine/WARNING] Process called exit when leaving - Skipping cleanups
There are some issues with the lack of disk so the platform used
We provide a platform with disks `hosts_with_disks.xml` We will use MPIIO, and indicate the path where to write the files `/scratch`
the hostfile looks like:
bob carl bob carl
nix develop .#expe --command smpirun -np 4 -hostfile hostfile_io -platform hosts_with_disks.xml ./ior.bin -a MPIIO -t 1m -b 16m -s 16 -F -o "/scratch/testFile" -V
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/privatization' to '1' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/np' to '4' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/hostfile' to 'hostfile_io' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'surf/precision' to '1e-9' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'network/model' to 'SMPI' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/tmpdir' to '/tmp/nix-shell.YYCfoq' [0.000000] [smpi_config/INFO] You did not set the power of the host running the simulation. The timings will certainly not be accurate. Use the option "--cfg=smpi/host-speed:<flops>" to set its value. Check https://simgrid.org/doc/latest/Configuring_SimGrid.html#automatic-benchm arking-of-smpi-code for more information. IOR-3.3.0: MPI Coordinated Test of Parallel I/O Began : Mon Oct 16 17:24:41 2023 Command line : ./ior.bin -a MPIIO -t 1m -b 16m -s 16 -F -o /scratch/testFile -V Machine : Linux kagel TestID : 0 StartTime : Mon Oct 16 17:24:41 2023 ior WARNING: unable to statfs() file system. Options: api : MPIIO apiVersion : (3.1) test filename : /scratch/testFile access : file-per-process type : independent segments : 16 ordering in a file : sequential ordering inter file : no tasks offsets nodes : 2 tasks : 4 clients per node : 2 repetitions : 1 xfersize : 1 MiB blocksize : 16 MiB aggregate filesize : 1 GiB Results: access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ---- write 36.17 36.17 1.44 16384 1024.00 0.000000 28.31 5.24 28.31 0 read 76.78 76.78 0.505938 16384 1024.00 0.000000 13.34 5.24 13.34 0 remove - - - - - - - - 0.000908 0 Max Write: 36.17 MiB/sec (37.93 MB/sec) Max Read: 76.78 MiB/sec (80.51 MB/sec) Summary of all tests: Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Max(OPs) Min(OPs) Mean(OPs) StdDev Mean(s) Stonewall(s) Stonewall(MiB) Test # #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggs(MiB) API RefNum write 36.17 36.17 36.17 0.00 36.17 36.17 36.17 0.00 28.30982 NA NA 0 4 2 1 1 0 1 0 0 16 16777216 1048576 1024.0 MPIIO 0 read 76.78 76.78 76.78 0.00 76.78 76.78 76.78 0.00 13.33615 NA NA 0 4 2 1 1 0 1 0 0 16 16777216 1048576 1024.0 MPIIO 0 Finished : Mon Oct 16 17:24:41 2023
Ok nice!
there are some warning, but because the io operations managed by simgrid do not support statfs i think
Ok now let’s trace ior
for this we need to ask smpirun to have an time idenpendent trace (-trace-ti) and give where we want the result of the trace
nix develop .#expe --command smpirun -trace-ti --cfg=tracing/filename:ior-trace-smpi -np 4 -hostfile hostfile_io -platform hosts_with_disks.xml ./ior.bin -a MPIIO -t 1m -b 16m -s 16 -F -o "/scratch/testFile" -V
There should now be one `ior-trace-smpi` file and a folder named `ior-trace-smpi_files` with the traces per rank
the `ior-trace-smpi` simply contains the paths to the individual traces
If we open one of those trace, we can see the operation done and their quantity
For example the first 40 lines for rank 1:
0 init 0 compute 0.1304 0 allreduce 1 0 1 0 compute 0.05926 0 compute 0.02646 0 allreduce 1 0 1 0 compute 0.13312 0 bcast 4 0 6 0 compute 0.04292 0 bcast 4096 0 2 0 compute 0.02584 0 gather 1 1 0 1 1 0 bcast 1 0 1 0 compute 0.04256 0 barrier 0 compute 0.02194 0 reduce 1 0 0 0 0 reduce 1 0 0 0 0 bcast 1 0 0 0 bcast 1 0 1 0 compute 2.09196 0 compute 0.13792 0 bcast 1 0 11 0 compute 9.745 0 compute 0.0286 0 compute 0.02298 0 barrier 0 compute 0.02346 0 compute 0.05612 0 compute 0.24912 0 compute 0.06424 0 IO - write 1.04858e+06 0 IO - write 1.04858e+06 0 IO - write 1.04858e+06 0 IO - write 1.04858e+06 0 IO - write 1.04858e+06 0 IO - write 1.04858e+06 0 IO - write 1.04858e+06 0 IO - write 1.04858e+06 0 IO - write 1.04858e+06
Ok! very nice we just need to replay this in SMPI now!
so there is a IO replayer in the simgrid repo: https://framagit.org/simgrid/simgrid/-/tree/master/examples/cpp/replay-io
/* Copyright (c) 2017-2023. The SimGrid Team. All rights reserved. */
/* This program is free software; you can redistribute it and/or modify it
* under the terms of the license (GNU LGPL) which comes with this package. */
#include <simgrid/plugins/file_system.h>
#include <simgrid/s4u.hpp>
#include <xbt/replay.hpp>
#include <xbt/str.h>
#include <boost/algorithm/string/join.hpp>
XBT_LOG_NEW_DEFAULT_CATEGORY(replay_io, "Messages specific for this example");
namespace sg4 = simgrid::s4u;
#define ACT_DEBUG(...) \
if (XBT_LOG_ISENABLED(replay_io, xbt_log_priority_verbose)) { \
std::string NAME = boost::algorithm::join(action, " "); \
XBT_DEBUG(__VA_ARGS__); \
} else \
((void)0)
class Replayer {
static std::unordered_map<std::string, sg4::File*> opened_files;
static void log_action(const simgrid::xbt::ReplayAction& action, double date)
{
if (XBT_LOG_ISENABLED(replay_io, xbt_log_priority_verbose)) {
std::string s = boost::algorithm::join(action, " ");
XBT_VERB("%s %f", s.c_str(), date);
}
}
static sg4::File* get_file_descriptor(const std::string& file_name)
{
std::string full_name = sg4::this_actor::get_name() + ":" + file_name;
return opened_files.at(full_name);
}
public:
explicit Replayer(std::vector<std::string> args)
{
const char* actor_name = args[0].c_str();
if (args.size() > 1) { // split mode, the trace file was provided in the deployment file
const char* trace_filename = args[1].c_str();
simgrid::xbt::replay_runner(actor_name, trace_filename);
} else { // Merged mode
simgrid::xbt::replay_runner(actor_name);
}
}
void operator()() const
{
// Nothing to do here
}
/* My actions */
static void open(simgrid::xbt::ReplayAction& action)
{
std::string file_name = action[2];
double clock = sg4::Engine::get_clock();
std::string full_name = sg4::this_actor::get_name() + ":" + file_name;
ACT_DEBUG("Entering Open: %s (filename: %s)", NAME.c_str(), file_name.c_str());
auto* file = sg4::File::open(file_name, nullptr);
opened_files.try_emplace(full_name, file);
log_action(action, sg4::Engine::get_clock() - clock);
}
static void read(simgrid::xbt::ReplayAction& action)
{
std::string file_name = action[2];
sg_size_t size = std::stoul(action[3]);
double clock = sg4::Engine::get_clock();
sg4::File* file = get_file_descriptor(file_name);
ACT_DEBUG("Entering Read: %s (size: %llu)", NAME.c_str(), size);
file->read(size);
log_action(action, sg4::Engine::get_clock() - clock);
}
static void close(simgrid::xbt::ReplayAction& action)
{
std::string file_name = action[2];
std::string full_name = sg4::this_actor::get_name() + ":" + file_name;
double clock = sg4::Engine::get_clock();
ACT_DEBUG("Entering Close: %s (filename: %s)", NAME.c_str(), file_name.c_str());
auto entry = opened_files.find(full_name);
xbt_assert(entry != opened_files.end(), "File not found in opened files: %s", full_name.c_str());
entry->second->close();
opened_files.erase(entry);
log_action(action, sg4::Engine::get_clock() - clock);
}
};
std::unordered_map<std::string, sg4::File*> Replayer::opened_files;
int main(int argc, char* argv[])
{
sg4::Engine e(&argc, argv);
sg_storage_file_system_init();
xbt_assert(argc > 3,
"Usage: %s platform_file deployment_file [action_files]\n"
"\texample: %s platform.xml deployment.xml actions # if all actions are in the same file\n"
"\t# if actions are in separate files, specified in deployment\n"
"\texample: %s platform.xml deployment.xml",
argv[0], argv[0], argv[0]);
e.load_platform(argv[1]);
e.register_actor<Replayer>("p0");
e.load_deployment(argv[2]);
if (argv[3] != nullptr)
xbt_replay_set_tracefile(argv[3]);
/* Action registration */
xbt_replay_action_register("open", Replayer::open);
xbt_replay_action_register("read", Replayer::read);
xbt_replay_action_register("close", Replayer::close);
e.run();
XBT_INFO("Simulation time %g", sg4::Engine::get_clock());
return 0;
}
for this replayer, there are 3 operations: open, read, and close So this is not enough to replay our trace yet
For now we will just try to print something when we detect a read or write operation in the trace:
/* Copyright (c) 2009-2023. The SimGrid Team. All rights reserved. */
/* This program is free software; you can redistribute it and/or modify it
* under the terms of the license (GNU LGPL) which comes with this package. */
#include "xbt/replay.hpp"
#include <simgrid/s4u/Actor.hpp>
#include "smpi/smpi.h"
#include "xbt/asserts.h"
#include "xbt/str.h"
#include "xbt/log.h"
XBT_LOG_NEW_DEFAULT_CATEGORY(replay_test, "Messages specific for this example");
/* This shows how to extend the trace format by adding a new kind of events.
This function is registered through xbt_replay_action_register() below. */
static void action_io_write(const simgrid::xbt::ReplayAction& args)
{
/* Add your answer to the blah event here.
args is a strings array containing the blank-separated parameters found in the trace for this event instance. */
XBT_INFO("io_write!");
}
static void action_io_read(const simgrid::xbt::ReplayAction& args)
{
/* Add your answer to the blah event here.
args is a strings array containing the blank-separated parameters found in the trace for this event instance. */
XBT_INFO("io_read!");
}
int main(int argc, char* argv[])
{
const auto* properties = simgrid::s4u::Actor::self()->get_properties();
const char* instance_id = properties->at("instance_id").c_str();
const int rank = static_cast<int>(xbt_str_parse_int(properties->at("rank").c_str(), "Cannot parse rank"));
const char* shared_trace =
simgrid::s4u::Actor::self()->get_property("tracefile"); // Cannot use properties because this can be nullptr
const char* private_trace = argv[1];
double start_delay_flops = 0;
if (argc > 2) {
start_delay_flops = xbt_str_parse_double(argv[2], "Cannot parse start_delay_flops");
}
/* Setup things and register default actions */
smpi_replay_init(instance_id, rank, start_delay_flops);
/* Connect your callback function to the "blah" event in the trace files */
xbt_replay_action_register("IO - write", action_io_write);
xbt_replay_action_register("IO - read", action_io_read);
/* The regular run of the replayer */
if (shared_trace != nullptr)
xbt_replay_set_tracefile(shared_trace);
smpi_replay_main(rank, private_trace);
return 0;
}
To compile, run:
nix develop .#dev --command smpicxx replay.cpp -o myreplay -std=c++17
nix develop .#dev --command smpirun -replay ior-trace-smpi -np 4 -platform hosts_with_disks.xml -hostfile hostfile_io myreplay
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/privatization' to '1' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/replay' to 'ior-trace-smpi' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/np' to '4' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/hostfile' to 'hostfile_io' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'surf/precision' to '1e-9' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'network/model' to 'SMPI' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/tmpdir' to '/tmp/nix-shell.3yPMvS' [0.000000] [smpi_config/INFO] You did not set the power of the host running the simulation. The timings will certainly not be accurate. Use the option "--cfg=smpi/host-speed:<flops>" to set its value. Check https://simgrid.org/doc/latest/Configuring_SimGrid.html#automatic-benchmarking-of-smpi-code for more information. [bob:0:(1) 0.004911] ../src/xbt/xbt_replay.cpp:95: [root/CRITICAL] Replay Error: action IO is unknown, please register it properly in the replay engine myreplay --cfg=smpi/privatization:1 --cfg=surf/precision:1e-9 --cfg=network/model:SMPI --cfg=smpi/tmpdir:/tmp/nix-shell.3yPMvS hosts_with_disks.xml Execution failed with code 134.
Argh the issue seems to be the parsing of the io operation in the trace....
Let’s rename “IO - write” to “io-write”, and same for read
find ./ior-trace-smpi_files -type f -exec sed -i -e 's/IO - write/io-write/g' {} \;
find ./ior-trace-smpi_files -type f -exec sed -i -e 's/IO - read/io-read/g' {} \;
find ./replay.cpp -type f -exec sed -i -e 's/IO - write/io-write/g' {} \;
find ./replay.cpp -type f -exec sed -i -e 's/IO - read/io-read/g' {} \;
and recompile
nix develop .#dev --command smpicxx replay.cpp -o myreplay -std=c++17
nix develop .#dev --command smpirun -replay ior-trace-smpi -np 4 -platform hosts_with_disks.xml -hostfile hostfile_io myreplay
and we get the following:
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/privatization' to '1' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/replay' to 'ior-trace-smpi' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/np' to '4' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/hostfile' to 'hostfile_io' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'surf/precision' to '1e-9' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'network/model' to 'SMPI' [0.000000] [xbt_cfg/INFO] Configuration change: Set 'smpi/tmpdir' to '/tmp/nix-shell.MTIbcL' [0.000000] [smpi_config/INFO] You did not set the power of the host running the simulation. The timings will certainly not be accurate. Use the option "--cfg=smpi/host-speed:<flops >" to set its value. Check https://simgrid.org/doc/latest/Configuring_SimGrid.html#automatic-benchmarking-of-smpi-code for more information. [bob:0:(1) 0.004911] [replay_test/INFO] io_write! [bob:0:(1) 0.004911] [replay_test/INFO] io_write! [bob:0:(1) 0.004911] [replay_test/INFO] io_write! [bob:0:(1) 0.004911] [replay_test/INFO] io_write! [bob:0:(1) 0.004911] [replay_test/INFO] io_write! [bob:0:(1) 0.004911] [replay_test/INFO] io_write! [bob:0:(1) 0.004911] [replay_test/INFO] io_write! [bob:0:(1) 0.004911] [replay_test/INFO] io_write! [bob:0:(1) 0.004911] [replay_test/INFO] io_write! [bob:0:(1) 0.004911] [replay_test/INFO] io_write! [bob:0:(1) 0.004911] [replay_test/INFO] io_write! [bob:0:(1) 0.004911] [replay_test/INFO] io_write! [bob:0:(1) 0.004911] [replay_test/INFO] io_write!
Ok good! lets try to see what are the available informations
with a small change we can see that the available information are: `[bob:0:(1) 0.004911] [replay_test/INFO] 0 io-write 1.04858e+06`
so the rank, the operation, and the size
so, not awesome for us as it is as we would like to also know which file and other information.
but for now let’s continue and try to write the correct amount somewhere
the doc seems to be here: https://simgrid.org/doc/latest/app_s4u.html#i-o-operations
inspired by https://framagit.org/simgrid/simgrid/-/blob/master/examples/cpp/io-file-system/s4u-io-file-system.cpp, we get
static void action_io_write(const simgrid::xbt::ReplayAction& args) { /* Add your answer to the blah event here. args is a strings array containing the blank-separated parameters found in the trace for this event instance. */ XBT_INFO("io_write!"); std::string filename = "/scratch/testFile"; auto* file = simgrid::s4u::File::open(filename, nullptr); long int io_size = std::atof(args[2].c_str()); sg_size_t write = file->write(io_size); XBT_INFO("Create a %llu bytes file named '%s' on /scratch : %ld", write, filename.c_str(), io_size); file->close(); } static void action_io_read(const simgrid::xbt::ReplayAction& args) { /* Add your answer to the blah event here. args is a strings array containing the blank-separated parameters found in the trace for this event instance. */ XBT_INFO("io_read!"); std::string filename = "/scratch/testFile"; auto* file = simgrid::s4u::File::open(filename, nullptr); long int io_size = std::atof(args[2].c_str()); sg_size_t read = file->read(io_size); XBT_INFO("Read a %llu bytes file named '%s' on /scratch : %ld", read, filename.c_str(), io_size); file->close(); }
But sometimes, we get some weird behavior:
[carl:1:(2) 6.169625] [replay_test/INFO] Create a 0 bytes file named '/scratch/testFile' on /scratch : 1048580 [carl:1:(2) 6.169625] [replay_test/INFO] io_write! [carl:1:(2) 6.190098] [replay_test/INFO] Create a 0 bytes file named '/scratch/testFile' on /scratch : 1048580 [carl:3:(4) 6.190098] [replay_test/INFO] Create a 0 bytes file named '/scratch/testFile' on /scratch : 1048580
like it sees that it must write 1048580 bytes, but write 0
weird
let’s run only on `bob`
nix develop .#dev --command smpirun -replay ior-trace-smpi -np 4 -platform hosts_with_disks.xml -hostfile hostfile_bob myreplay
This is much better
ok so there is something to investigate with remote disks (carl mounts a disk from bob)
The tracing seems to be done here: https://framagit.org/simgrid/simgrid/-/blob/master/src/smpi/bindings/smpi_pmpi_file.cpp
so we can try to modify this to have at least the name of the file
It seems like it shouldn’t be too difficult as we can pass anything to the trace function:
src/instr/instr_private.hpp: explicit CpuTIData(const std::string&); // disallow this constructor inherited from TIData
argh actually no, we’ll need to investigate more.
but tomorrow!
ok
so the class for the smpi files are available here: src/smpi/include/smpi_file.hpp
I added the name of the file to the trace, and renamed the “IO - write” to “io-write” directly in simgrid (same for read)
diff --git a/src/smpi/bindings/smpi_pmpi_file.cpp b/src/smpi/bindings/smpi_pmpi_file.cpp index e8dfd51da4..4a519d4795 100644 --- a/src/smpi/bindings/smpi_pmpi_file.cpp +++ b/src/smpi/bindings/smpi_pmpi_file.cpp @@ -99,7 +99,8 @@ int PMPI_File_read(MPI_File fh, void *buf, int count,MPI_Datatype datatype, MPI_ PASS_ZEROCOUNT(count) const SmpiBenchGuard suspend_bench; aid_t rank_traced = simgrid::s4u::this_actor::get_pid(); - TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData("IO - read", count * datatype->size())); + auto plop = "io-read " + std::string(fh->name()); + TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData(plop, count * datatype->size())); int ret = simgrid::smpi::File::read(fh, buf, count, datatype, status); TRACE_smpi_comm_out(rank_traced); return ret; @@ -124,7 +125,8 @@ int PMPI_File_write(MPI_File fh, const void *buf, int count,MPI_Datatype datatyp PASS_ZEROCOUNT(count) const SmpiBenchGuard suspend_bench; aid_t rank_traced = simgrid::s4u::this_actor::get_pid(); - TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData("IO - write", count * datatype->size())); + auto plop = "io-write " + std::string(fh->name()); + TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData(plop, count * datatype->size())); int ret = simgrid::smpi::File::write(fh, const_cast<void*>(buf), count, datatype, status); TRACE_smpi_comm_out(rank_traced); return ret; diff --git a/src/smpi/include/smpi_file.hpp b/src/smpi/include/smpi_file.hpp index 60ab2fde22..c74e3e0855 100644 --- a/src/smpi/include/smpi_file.hpp +++ b/src/smpi/include/smpi_file.hpp @@ -44,7 +44,8 @@ class File : public F2C{ int flags() const; MPI_Datatype etype() const; MPI_Comm comm() const; - std::string name() const override {return file_ ? std::string("MPI_File: ")+ std::string(file_->get_path()): std::string("MPI_File");} + // std::string name() const override {return file_ ? std::string("MPI_File: ")+ std::string(file_->get_path()): std::string("MPI_File");} + std::string name() const override {return file_ ? std::string(file_->get_path()): std::string("MPI_File");} int sync(); int seek(MPI_Offset offset, int whence);
Alright, we can now integrate the information into the replayer
static void action_io_write(const simgrid::xbt::ReplayAction& args) { std::string filename = args[2]; auto* file = simgrid::s4u::File::open(filename, nullptr); long int io_size = std::atof(args[3].c_str()); sg_size_t write = file->write(io_size); XBT_INFO("Create a %llu bytes file named '%s' on /scratch : %ld", write, filename.c_str(), io_size); file->close(); } static void action_io_read(const simgrid::xbt::ReplayAction& args) { std::string filename = args[2]; auto* file = simgrid::s4u::File::open(filename, nullptr); long int io_size = std::atof(args[3].c_str()); sg_size_t read = file->read(io_size); XBT_INFO("Read a %llu bytes file named '%s' on /scratch : %ld", read, filename.c_str(), io_size); file->close(); }
nice, we can write to the correct files now!
Ok it would now be very nice if we can visualize the smpi run of ior and the replay of the trace.
we will follow: https://simgrid.org/contrib/R_visualization.html
We can generate the pajeng trace:
nix develop .#expe --command smpirun -trace --cfg=tracing/filename:ior-trace-vizu -np 4 -hostfile hostfile_bob -platform hosts_with_disks.xml ./ior.bin -a MPIIO -t 1m -b 16m -s 16 -F -o "/scratch/testFile" -V
# Read the data library(tidyverse) library(pajengr) dta <- pajeng_read("ior-trace-vizu") # Manipulate the data dta$state %>% # Remove some unnecessary columns for this example select(-Type, -Imbrication) %>% # Create the nice MPI rank and operations identifiers mutate(Container = as.integer(gsub("rank-", "", Container)), Value = gsub("^PMPI_", "MPI_", Value)) %>% # Rename some columns so it can better fit MPI terminology rename(Rank = Container, Operation = Value) -> df.states # Draw the Gantt Chart df.states %>% ggplot() + # Each MPI operation is becoming a rectangle geom_rect(aes(xmin=Start, xmax=End, ymin=Rank, ymax=Rank + 1, fill=Operation)) + # Cosmetics xlab("Time [seconds]") + ylab("Rank [count]") + theme_bw(base_size=14) + theme( plot.margin = unit(c(0,0,0,0), "cm"), legend.margin = margin(t = 0, unit='cm'), panel.grid = element_blank(), legend.position = "top", legend.justification = "left", legend.box.spacing = unit(0, "pt"), legend.box.margin = margin(0,0,0,0), legend.title = element_text(size=10)) -> plot # Save the plot in a PNG file (dimensions in inches) ggsave("ior-trace.png", plot, width = 10, height = 3)
nix develop .#rshell --command Rscript script.R
Well ok, let’s now try with the replayer (i dont even know if it’s possible)
nix develop .#dev --command smpirun -np 4 -platform hosts_with_disks.xml -hostfile hostfile_bob -trace --cfg=tracing/filename:ior-trace-replay -replay ior-trace-smpi2 myreplay
Let’s change the Rscript to accept arguments
nix develop .#rshell --command Rscript script.R ior-trace-vizu ior-trace.png
nix develop .#rshell --command Rscript script.R ior-trace-replay ior-replay.png
hummm nothing on the replay argh
this is probably because we dont use MPI_File_read but read directly …
Ok, let’s try to call MPI function in the replayer
fuck i’ll write a quick makefile i had enough
Adrien had this for his expes
smpirun -replay trace_files.txt -np 16 -platform platform.xml -hostfile hostfile -trace -trace-file $@ --cfg=tracing/uncategorized:yes --cfg=host/model:ptask_L07
So, after some time fighting here is what i have done:
- added tracing on MPI_File_Open and MPI_File_Close
- in the replayer add a hashmap for remembering the open/closed files
- in the replayer add support for open and close operation
diff --git a/src/smpi/bindings/smpi_pmpi_file.cpp b/src/smpi/bindings/smpi_pmpi_file.cpp index e8dfd51da4..292e417d3b 100644 --- a/src/smpi/bindings/smpi_pmpi_file.cpp +++ b/src/smpi/bindings/smpi_pmpi_file.cpp @@ -43,13 +43,21 @@ int PMPI_File_open(MPI_Comm comm, const char *filename, int amode, MPI_Info info if (amode < 0) return MPI_ERR_AMODE; const SmpiBenchGuard suspend_bench; + + aid_t rank_traced = simgrid::s4u::this_actor::get_pid(); + auto plop = "io-open " + std::string(filename); + TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData(plop, 0)); *fh = new simgrid::smpi::File(comm, filename, amode, info); + TRACE_smpi_comm_out(rank_traced); + if ((*fh)->size() != 0 && (amode & MPI_MODE_EXCL)){ delete fh; return MPI_ERR_AMODE; } if(amode & MPI_MODE_APPEND) (*fh)->seek(0,MPI_SEEK_END); return MPI_SUCCESS; } @@ -57,7 +65,11 @@ int PMPI_File_close(MPI_File *fh){ CHECK_NULL(2, MPI_ERR_ARG, fh) CHECK_COLLECTIVE((*fh)->comm(), __func__) const SmpiBenchGuard suspend_bench; + aid_t rank_traced = simgrid::s4u::this_actor::get_pid(); + auto plop = "io-close " + std::string((*fh)->name()); + TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData(plop, 0)); int ret = simgrid::smpi::File::close(fh); + TRACE_smpi_comm_out(rank_traced); *fh = MPI_FILE_NULL; return ret; } @@ -99,7 +111,8 @@ int PMPI_File_read(MPI_File fh, void *buf, int count,MPI_Datatype datatype, MPI_ PASS_ZEROCOUNT(count) const SmpiBenchGuard suspend_bench; aid_t rank_traced = simgrid::s4u::this_actor::get_pid(); - TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData("IO - read", count * datatype->size())); + auto plop = "io-read " + std::string(fh->name()); + TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData(plop, count * datatype->size())); int ret = simgrid::smpi::File::read(fh, buf, count, datatype, status); TRACE_smpi_comm_out(rank_traced); return ret; @@ -124,7 +137,8 @@ int PMPI_File_write(MPI_File fh, const void *buf, int count,MPI_Datatype datatyp PASS_ZEROCOUNT(count) const SmpiBenchGuard suspend_bench; aid_t rank_traced = simgrid::s4u::this_actor::get_pid(); - TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData("IO - write", count * datatype->size())); + auto plop = "io-write " + std::string(fh->name()); + TRACE_smpi_comm_in(rank_traced, __func__, new simgrid::instr::CpuTIData(plop, count * datatype->size())); int ret = simgrid::smpi::File::write(fh, const_cast<void*>(buf), count, datatype, status); TRACE_smpi_comm_out(rank_traced); return ret; diff --git a/src/smpi/include/smpi_file.hpp b/src/smpi/include/smpi_file.hpp index 60ab2fde22..c74e3e0855 100644 --- a/src/smpi/include/smpi_file.hpp +++ b/src/smpi/include/smpi_file.hpp @@ -44,7 +44,8 @@ class File : public F2C{ int flags() const; MPI_Datatype etype() const; MPI_Comm comm() const; - std::string name() const override {return file_ ? std::string("MPI_File: ")+ std::string(file_->get_path()): std::string("MPI_File");} + // std::string name() const override {return file_ ? std::string("MPI_File: ")+ std::string(file_->get_path()): std::string("MPI_File");} + std::string name() const override {return file_ ? std::string(file_->get_path()): std::string("MPI_File");} int sync(); int seek(MPI_Offset offset, int whence);
So now we get the same simualtion time between the ior in smpi and the replayer!!!!
nix develop .#expe --command smpirun --cfg=smpi/display-timing:yes -np 4 -hostfile hostfile_bob -platform hosts_with_disks.xml ./ior.bin -a MPIIO -t 1m -b 16m -s 16 -F -o "/scratch/testFile" -V [37.580965] [smpi_utils/INFO] Simulated time: 37.581 seconds
nix develop .#expe --command smpirun -trace-ti --cfg=tracing/filename:ior-trace-smpi_open -np 4 -hostfile hostfile_bob -platform hosts_with_disks.xml ./ior.bin -a MPIIO -t 1m -b 16m -s 16 -F -o "/scratch/testFile" -V nix develop .#expe --command smpirun -np 4 -platform hosts_with_disks.xml -hostfile hostfile_bob --cfg=smpi/display-timing:yes -replay ior-trace-smpi_open myreplay [37.581108] [smpi_utils/INFO] Simulated time: 37.5811 seconds.
Pretty good!
Let’s write some “doc”
ok let’s now try with a different platform.
we will have 4 nodes that all mount a disk on a remote host.
then in the hostfile we will have the 4 nodes
it is in scenario `remote-disk`
ok but now when i try with a single file in ior the replayer does not work.
I suppose that is just that we do not trace enough information to replay properly.
so we also need to trace:
- PMPI_File_delete
- PMPI_File_seek_*
- PMPI_File_get_position_*
Also, if we remove the `-V` flag of IOR (useFileView – use MPI_File_set_view), the replayer crashed because we use different finction (PMPI_File_read_at for example) this needs a tiny bit of work, but should be ok we just need to also pass the information about the offset
We’ll see next time
ok let’s implement all the operations remaining
for the file_read_at, the method is MPI_SEEK_SET
Ok implemented the replay to remove the `-V` of IOR
There are still some issue witht the single file…
It is not very clear what the issue is with a single file
there are probably some weird stuff going on with seeks
but otherwise, it seems to work good!