Skip to content

Commit

Permalink
sync up with Summit
Browse files Browse the repository at this point in the history
add more notes to the docs
  • Loading branch information
zingale committed Jan 30, 2024
1 parent 66be1b8 commit 8ab93c3
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 18 deletions.
8 changes: 4 additions & 4 deletions job_scripts/perlmutter/process.xrb
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ work_dir=`pwd`
HPSS_DIR=`basename $work_dir`

# set HTAR command
HTAR=/usr/bin/htar
HTAR=htar

# path to the ftime executable -- used for making a simple ftime.out file
# listing the name of the plotfile and its simulation time
Expand Down Expand Up @@ -229,12 +229,12 @@ function process_files
datestr=$(date +"%Y%m%d_%H%M_%S")
ftime_files=$(find . -maxdepth 1 -name "ftime.out" -print)
inputs_files=$(find . -maxdepth 1 -name "inputs*" -print)
probin_files=$(find . -maxdepth 1 -name "probin*" -print)
diag_files=$(find . -maxdepth 1 -name "*diag.out" -print)
model_files=$(find . -maxdepth 1 -name "*.hse.*" -print)
slurm_files=$(find . -maxdepth 1 -name "*.slurm" -print)
job_files=$(find . -maxdepth 1 -name "*.slurm" -print) $(find . -maxdepth 1 -name "*.submit" -print)
process_files=$(find . -maxdepth 1 -name "process*" -print)

${HTAR} -cvf ${HPSS_DIR}/diag_files_${datestr}.tar ${model_files} ${ftime_files} ${inputs_files} ${probin_files} ${slurm_files} ${process_files} >> /dev/null
${HTAR} -cvf ${HPSS_DIR}/diag_files_${datestr}.tar ${model_files} ${ftime_files} ${inputs_files} ${probin_files} ${job_files} ${process_files} >> /dev/null


# Loop, waiting for plt and chk directories to appear.
Expand Down
34 changes: 20 additions & 14 deletions sphinx_docs/source/nersc-hpss.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,21 +67,27 @@ The following describes how to use the scripts:
overwriting the stored copy, especially if a purge took place. The
same is done with checkpoint files.

Some additional notes:

Additionally, if the ``ftime`` executable is in your path
(``ftime.cpp`` lives in ``amrex/Tools/Plotfile/``), then
the script will create a file called ``ftime.out`` that lists the name
of the plotfile and the corresponding simulation time.

Finally, right when the job is submitted, the script will tar up all
of the diagnostic files, ``ftime.out``, submission script, inputs and
probin, and archive them on HPSS. The .tar file is given a name that
contains the date-string to allow multiple archives to co-exist. When
``process.xrb`` is running, it creates a lockfile (called
``process.pid``) that ensures that only one instance of the script is
running at any one time. Sometimes if the machine crashes, the
``process.pid`` file will be left behind, in which case, the script
aborts. Just delete that if you know the script is not running.
* If the ``ftime`` executable is in your path (``ftime.cpp`` lives in
``amrex/Tools/Plotfile/``), then the script will create a file
called ``ftime.out`` that lists the name of the plotfile and the
corresponding simulation time.

* Right when the job is run, the script will tar up all of the
diagnostic files, ``ftime.out``, submission script, and inputs and
archive them on HPSS. The ``.tar`` file is given a name that contains
the date-string to allow multiple archives to co-exist.

* When ``process.xrb`` is running, it creates a lockfile (called
``process.pid``) that ensures that only one instance of the script
is running at any one time.

.. warning::

Sometimes if the job is not terminated normally, the
``process.pid`` file will be left behind, in which case, the script
aborts. Just delete that if you know the script is not running.

Jobs in the xfer queue start up quickly. The best approach is to start
one as you start your main job (or make it dependent on the main
Expand Down

0 comments on commit 8ab93c3

Please sign in to comment.