Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random-access for variable-based encoding (i.e. use SetStepSelection for ADIOS2 steps) #1706

Merged
merged 33 commits into from
Feb 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
1e4976f
Add preparseSnapshots
franzpoeschel Dec 16, 2024
559a97e
Serial ADIOS2 implementation for readAttributeAllsteps
franzpoeschel Dec 16, 2024
d561662
Feature-complete, untested
franzpoeschel Dec 16, 2024
9e8a590
Fixes
franzpoeschel Dec 16, 2024
eee83f6
wip: parallel impl. for readAttributeAllsteps
franzpoeschel Dec 18, 2024
5a780a9
Simplify, cleanup
franzpoeschel Dec 19, 2024
d9d1657
wip testing
franzpoeschel Dec 19, 2024
a3cabdd
CI fixes
franzpoeschel Dec 19, 2024
f8a9e19
Fix test
franzpoeschel Dec 19, 2024
7353439
Add MPI_CHECK
franzpoeschel Dec 19, 2024
926581e
Stricter warning about group-based encoding in BP5
franzpoeschel Jan 13, 2025
0da9a01
variableBased as default in ADIOS2
franzpoeschel Jan 13, 2025
432fe2e
Add error check for non-homogeneous datasets
franzpoeschel Jan 14, 2025
1c230e1
Reset steps before reading the rankTable
franzpoeschel Jan 14, 2025
544e765
Use writeIterations() in MPI Benchmark
franzpoeschel Jan 14, 2025
547546c
Use group tables by default
franzpoeschel Jan 14, 2025
313b9a0
Add iterate_nonstreaming_series test to parallel tests
franzpoeschel Jan 14, 2025
30fe9b3
Remove std::cout calls
franzpoeschel Jan 14, 2025
e535762
wip: use variable-based encoding in tests
franzpoeschel Jan 14, 2025
185d023
CI fixes
franzpoeschel Jan 15, 2025
02a4fd4
Fix Windows testing
franzpoeschel Feb 12, 2025
126664c
Cleanup
franzpoeschel Feb 13, 2025
182da14
Warn unimplemented modifiable attributes for now
franzpoeschel Feb 13, 2025
c2e2b2b
Fix test???
franzpoeschel Feb 13, 2025
9c6f4aa
Add this to variableBasedSeries test
franzpoeschel Feb 13, 2025
1794b3a
Remove debugging line
franzpoeschel Feb 13, 2025
3c95b2c
Use own flag for the warning
franzpoeschel Feb 14, 2025
dce876e
Documentation
franzpoeschel Feb 14, 2025
816ae26
Skip inhomogeneous datasets instead of outright failing
franzpoeschel Feb 14, 2025
29f6403
add test for default iteration encoding
franzpoeschel Feb 14, 2025
ffd887d
Move test to its own file
franzpoeschel Feb 14, 2025
e15ab8c
Don't distinguish ADIOS2 v2.9 any more
franzpoeschel Feb 17, 2025
b9158a6
Some more documentation
franzpoeschel Feb 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -787,6 +787,15 @@ if(openPMD_BUILD_TESTING)
test/Files_SerialIO/close_and_reopen_test.cpp
test/Files_SerialIO/filebased_write_test.cpp
)
elseif(${test_name} STREQUAL "ParallelIO" AND openPMD_HAVE_MPI)
list(APPEND ${out_list}
test/Files_ParallelIO/read_variablebased_randomaccess.cpp
test/Files_ParallelIO/iterate_nonstreaming_series.cpp
)
elseif(${test_name} STREQUAL "Core")
list(APPEND ${out_list}
test/Files_Core/automatic_variable_encoding.cpp
)
endif()
endmacro()

Expand Down
3 changes: 2 additions & 1 deletion docs/source/usage/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ openPMD-api implements various file-formats (backends) and encoding strategies f
**Iteration encoding:** The openPMD-api can encode iterations in different ways.
The method ``Series::setIterationEncoding()`` (C++) or ``Series.set_iteration_encoding()`` (Python) may be used in writing for selecting one of the following encodings explicitly:

* **group-based iteration encoding:** This encoding is the default.
* **group-based iteration encoding:** This encoding is the default for HDF5 and JSON. In ADIOS2, variable-based encoding is preferred when possible due to better performance characteristics, see below.
It creates a separate group in the hierarchy of the openPMD standard for each iteration.
As an example, all data pertaining to iteration 0 may be found in group ``/data/0``, for iteration 100 in ``/data/100``.
* **file-based iteration encoding:** A unique file on the filesystem is created for each iteration.
Expand All @@ -57,6 +57,7 @@ The method ``Series::setIterationEncoding()`` (C++) or ``Series.set_iteration_en
A padding may be specified by ``"series_%06T.json"`` to create files ``series_000000.json``, ``series_000100.json`` and ``series_000200.json``.
The inner group layout of each file is identical to that of the group-based encoding.
* **variable-based iteration encoding:** This experimental encoding uses a feature of some backends (i.e., ADIOS2) to maintain datasets and attributes in several versions (i.e., iterations are stored inside *variables*).
When creating an ADIOS2 Series with steps (e.g. via ``series.writeIterations()`` / ``series.write_iterations()``), this encoding will be picked as a default instead of group-based encoding due to bad performance characteristics of group-based encoding in ADIOS2.
No iteration-specific groups are created and the corresponding layer is dropped from the openPMD hierarchy.
In backends that do not support this feature, a series created with this encoding can only contain one iteration.

Expand Down
12 changes: 8 additions & 4 deletions docs/source/usage/workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,14 @@ The openPMD-api distinguishes between a number of different access modes:
3. In streaming backends, random-access is not possible.
When using such a backend, the access mode will be coerced automatically to *linear read mode*.
Use of Series::readIterations() is mandatory for access.
4. Reading a variable-based Series is only fully supported with *linear access mode*.
If using *random-access read mode*, the dataset will be considered to only have one single step.
If the dataset only has one single step, this is guaranteed to work as expected.
Otherwise, it is undefined which step's data is returned.
4. *Random-access read mode* for a variable-based Series is currently experimental.
There is currently only very restricted support for metadata definitions that change across steps:

1. Modifiable attributes (except ``/data/snapshot``) can currently not be read. Attributes such as ``/data/time`` that naturally change their value across Iterations will hence not be really well-usable; the last Iteration's value will currently leak into all other Iterations.
2. There is no support for datasets that do not exist in all Iterations. The internal Iteration layouts should be homogeneous.
If you need this feature, please contact the openPMD-api developers; implementing this is currently not a priority.
Datasets that do not exist in all steps will be skipped at read time (with an error).
3. Datasets with changing extents are supported.

* **Read/Write mode**: Creates a new Series if not existing, otherwise opens an existing Series for reading and writing.
New datasets and iterations will be inserted as needed.
Expand Down
5 changes: 2 additions & 3 deletions examples/5_write_parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,7 @@
# in streaming setups, e.g. an iteration cannot be opened again once
# it has been closed.
# `Series.iterations` can be directly accessed in random-access workflows.
series.iterations[1].open()
mymesh = series.iterations[1]. \
mymesh = series.write_iterations()[1]. \
meshes["mymesh"]

# example 1D domain decomposition in first index
Expand Down Expand Up @@ -92,7 +91,7 @@
# The iteration can be closed in order to help free up resources.
# The iteration's content will be flushed automatically.
# An iteration once closed cannot (yet) be reopened.
series.iterations[1].close()
series.write_iterations()[1].close()

if 0 == comm.rank:
print("Dataset content has been fully written to disk")
Expand Down
4 changes: 2 additions & 2 deletions include/openPMD/Datatype.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -513,7 +513,7 @@ inline bool isFloatingPoint(Datatype d)
* @param d Datatype to test
* @return true if complex floating point, otherwise false
*/
inline bool isComplexFloatingPoint(Datatype d)
constexpr inline bool isComplexFloatingPoint(Datatype d)
{
using DT = Datatype;

Expand Down Expand Up @@ -554,7 +554,7 @@ inline bool isFloatingPoint()
* @return true if complex floating point, otherwise false
*/
template <typename T>
inline bool isComplexFloatingPoint()
constexpr inline bool isComplexFloatingPoint()
{
Datatype dtype = determineDatatype<T>();

Expand Down
2 changes: 2 additions & 0 deletions include/openPMD/IO/ADIOS/ADIOS2Auxiliary.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,8 @@ namespace adios_defaults
constexpr const_str str_usesteps = "usesteps";
constexpr const_str str_flushtarget = "preferred_flush_target";
constexpr const_str str_usesstepsAttribute = "__openPMD_internal/useSteps";
constexpr const_str str_useModifiableAttributes =
"__openPMD_internal/useModifiableAttributes";
constexpr const_str str_adios2Schema =
"__openPMD_internal/openPMD2_adios2_schema";
constexpr const_str str_isBoolean = "__is_boolean__";
Expand Down
8 changes: 7 additions & 1 deletion include/openPMD/IO/ADIOS/ADIOS2File.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,8 @@ struct DatasetReader
BufferedGet &bp,
adios2::IO &IO,
adios2::Engine &engine,
std::string const &fileName);
std::string const &fileName,
std::optional<size_t> stepSelection);

static constexpr char const *errorMsg = "ADIOS2: readDataset()";
};
Expand Down Expand Up @@ -412,6 +413,8 @@ class ADIOS2File
StreamStatus streamStatus = StreamStatus::OutsideOfStep;

size_t currentStep();
void setStepSelection(std::optional<size_t>);
[[nodiscard]] std::optional<size_t> stepSelection() const;

private:
ADIOS2IOHandlerImpl *m_impl;
Expand All @@ -420,8 +423,11 @@ class ADIOS2File
/*
* Not all engines support the CurrentStep() call, so we have to
* implement this manually.
* Note: We don't use a std::optional<size_t> here since the currentStep
* is always being counted.
*/
size_t m_currentStep = 0;
bool useStepSelection = false;

/*
* ADIOS2 does not give direct access to its internal attribute and
Expand Down
26 changes: 19 additions & 7 deletions include/openPMD/IO/ADIOS/ADIOS2IOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include "openPMD/Error.hpp"
#include "openPMD/IO/ADIOS/ADIOS2Auxiliary.hpp"
#include "openPMD/IO/ADIOS/ADIOS2FilePosition.hpp"
#include "openPMD/IO/ADIOS/macros.hpp"
#include "openPMD/IO/AbstractIOHandler.hpp"
#include "openPMD/IO/AbstractIOHandlerImpl.hpp"
#include "openPMD/IO/AbstractIOHandlerImplCommon.hpp"
Expand Down Expand Up @@ -190,6 +191,9 @@ class ADIOS2IOHandlerImpl

void readAttribute(Writable *, Parameter<Operation::READ_ATT> &) override;

void readAttributeAllsteps(
Writable *, Parameter<Operation::READ_ATT_ALLSTEPS> &) override;

void listPaths(Writable *, Parameter<Operation::LIST_PATHS> &) override;

void
Expand Down Expand Up @@ -431,7 +435,8 @@ class ADIOS2IOHandlerImpl
Offset const &offset,
Extent const &extent,
adios2::IO &IO,
std::string const &varName)
std::string const &varName,
std::optional<size_t> stepSelection)
{
{
auto requiredType = adios2::GetType<T>();
Expand All @@ -458,6 +463,10 @@ class ADIOS2IOHandlerImpl
throw std::runtime_error(
"[ADIOS2] Internal error: Failed opening ADIOS2 variable.");
}
if (stepSelection.has_value())
{
var.SetStepSelection({*stepSelection, 1});
}
// TODO leave this check to ADIOS?
adios2::Dims shape = var.Shape();
auto actualDim = shape.size();
Expand Down Expand Up @@ -533,11 +542,8 @@ namespace detail
struct AttributeReader
{
template <typename T>
static Datatype call(
ADIOS2IOHandlerImpl &,
adios2::IO &IO,
std::string name,
Attribute::resource &resource);
static Datatype
call(adios2::IO &IO, std::string name, Attribute::resource &resource);

template <int n, typename... Params>
static Datatype call(Params &&...);
Expand All @@ -562,7 +568,8 @@ namespace detail
ADIOS2IOHandlerImpl *impl,
InvalidatableFile const &,
std::string const &varName,
Parameter<Operation::OPEN_DATASET> &parameters);
Parameter<Operation::OPEN_DATASET> &parameters,
std::optional<size_t> stepSelection);

static constexpr char const *errorMsg = "ADIOS2: openDataset()";
};
Expand Down Expand Up @@ -854,6 +861,11 @@ class ADIOS2IOHandler : public AbstractIOHandler
return "ADIOS2";
}

bool fullSupportForVariableBasedEncoding() const override
{
return true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix: return m_engineType == "bp5", BP4 does not support modifiable attributes, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does

> bpls ../samples/variableBasedSeries.bp4 -alte /data/snapshot
Step 0:
  uint64_t  /data/snapshot                                              attr   = 0
Step 1:
  uint64_t  /data/snapshot                                              attr   = 1
Step 2:
  uint64_t  /data/snapshot                                              attr   = 2
Step 3:
  uint64_t  /data/snapshot                                              attr   = 3
Step 4:
  uint64_t  /data/snapshot                                              attr   = 4
Step 5:
  uint64_t  /data/snapshot                                              attr   = 5
Step 6:
  uint64_t  /data/snapshot                                              attr   = 6
Step 7:
  uint64_t  /data/snapshot                                              attr   = 7
Step 8:
  uint64_t  /data/snapshot                                              attr   = 8
Step 9:
  uint64_t  /data/snapshot                                              attr   = 9

}

std::future<void> flush(internal::ParsedFlushParams &) override;
}; // ADIOS2IOHandler
} // namespace openPMD
1 change: 1 addition & 0 deletions include/openPMD/IO/AbstractIOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,7 @@ class AbstractIOHandler

/** The currently used backend */
virtual std::string backendName() const = 0;
virtual bool fullSupportForVariableBasedEncoding() const;

std::string directory;
/*
Expand Down
25 changes: 25 additions & 0 deletions include/openPMD/IO/AbstractIOHandlerImpl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,13 @@ class AbstractIOHandlerImpl
* IO actions up to the point of closing a step must be performed now.
*
* The advance mode is determined by parameters.mode.
* parameters.mode has type std::variant<AdvanceMode, StepSelection>:
*
* 1. AdvanceMode is for processing steps sequentially. In this case, a step
* is either begun or closed.
* 2. StepSelection is for random-accessing steps. A target step number is
* specified.
*
* The return status code shall be stored as parameters.status.
*/
virtual void advance(Writable *, Parameter<Operation::ADVANCE> &parameters)
Expand Down Expand Up @@ -360,6 +367,24 @@ class AbstractIOHandlerImpl
*/
virtual void
readAttribute(Writable *, Parameter<Operation::READ_ATT> &) = 0;
/** Collective task to read modifiable attributes over steps.
*
* Has a default implementation for backends that do not support steps;
* here, the task is relayed to normal READ_ATT.
* This task is key for implementing the preparsing logic needed in
* random-access read mode for variable-encoded ADIOS2 files.
* adios2::Mode::ReadRandomAccess does not support modifiable attributes,
* so this task will instead quickly open the file's metadata in
* adios2::Mode::Read, go through all its steps and register the attribute
* values. Expensive and collective operation, run only once at startup.
* Absolutely necessary for reading /data/snapshot.
* Necessary (but not yet used) for having correct values in attributes
* such as /data/time.
* In future: Let this task preparse the entirety of all modifiable
* attributes.
*/
virtual void readAttributeAllsteps(
Writable *, Parameter<Operation::READ_ATT_ALLSTEPS> &);
/** List all paths/sub-groups inside a group, non-recursively.
*
* The operation should fail if the Writable was not marked written.
Expand Down
55 changes: 51 additions & 4 deletions include/openPMD/IO/IOTask.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,14 @@
#include "openPMD/Streaming.hpp"
#include "openPMD/auxiliary/Export.hpp"
#include "openPMD/auxiliary/Memory.hpp"
#include "openPMD/auxiliary/TypeTraits.hpp"
#include "openPMD/auxiliary/Variant.hpp"
#include "openPMD/backend/Attribute.hpp"
#include "openPMD/backend/ParsePreference.hpp"

#include <cstddef>
#include <map>
#include <memory>
#include <optional>
#include <string>
#include <utility>
#include <variant>
Expand Down Expand Up @@ -72,6 +73,7 @@ OPENPMDAPI_EXPORT_ENUM_CLASS(Operation){
DELETE_ATT,
WRITE_ATT,
READ_ATT,
READ_ATT_ALLSTEPS,
LIST_ATTS,

ADVANCE,
Expand Down Expand Up @@ -602,6 +604,38 @@ struct OPENPMDAPI_EXPORT Parameter<Operation::READ_ATT>
std::make_shared<Attribute::resource>();
};

template <>
struct OPENPMDAPI_EXPORT Parameter<Operation::READ_ATT_ALLSTEPS>
: public AbstractParameter
{
Parameter() = default;
Parameter(Parameter &&) = default;
Parameter(Parameter const &) = default;
Parameter &operator=(Parameter &&) = default;
Parameter &operator=(Parameter const &) = default;

std::unique_ptr<AbstractParameter> to_heap() && override
{
return std::unique_ptr<AbstractParameter>(
new Parameter<Operation::READ_ATT_ALLSTEPS>(std::move(*this)));
}

std::string name = "";
std::shared_ptr<Datatype> dtype = std::make_shared<Datatype>();

struct to_vector_type
{
template <typename T>
using type = std::vector<T>;
};
// std::variant<std::vector<T_1>, std::vector<T_2>, ...>
// for all T_i in openPMD::Datatype.
using result_type = typename auxiliary::detail::
map_variant<to_vector_type, Attribute::resource>::type;

std::shared_ptr<result_type> resource = std::make_shared<result_type>();
};

template <>
struct OPENPMDAPI_EXPORT Parameter<Operation::LIST_ATTS>
: public AbstractParameter
Expand Down Expand Up @@ -638,10 +672,23 @@ struct OPENPMDAPI_EXPORT Parameter<Operation::ADVANCE>
new Parameter<Operation::ADVANCE>(std::move(*this)));
}

//! input parameter
AdvanceMode mode;
struct StepSelection
{
std::optional<size_t> step;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please include <optional> for stability in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I've also removed an unused <map> import

};

// input parameters
/**
* AdvanceMode: Is one of BeginStep/EndStep. Used during writing and in
* linear read mode to step sequentially through steps.
* StepSelection: Used in random-access read mode, jump to the specified
* step. Can be nullopt in order to reset the backend to read
* step-agnostically, e.g. for reading global datasets such as
* /rankTable.
*/
std::variant<AdvanceMode, StepSelection> mode;
Copy link
Member

@ax3l ax3l Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe good to add an inline comment here of the form:

Suggested change
std::variant<AdvanceMode, StepSelection> mode;
std::variant<AdvanceMode, StepSelection> mode; //! AdvanceMode: LINEAR_READ; StepSelection: for random access support

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

bool isThisStepMandatory = false;
//! output parameter
// output parameter
std::shared_ptr<AdvanceStatus> status =
std::make_shared<AdvanceStatus>(AdvanceStatus::OK);
};
Expand Down
31 changes: 29 additions & 2 deletions include/openPMD/Iteration.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,32 @@ namespace internal
propagated to the backend */
};

namespace BeginStepTypes
{
struct DontBeginStep
{};
struct BeginStepSequentially
{};
struct BeginStepRandomAccess
{
size_t step;
};
} // namespace BeginStepTypes

using BeginStep = std::variant<
BeginStepTypes::DontBeginStep,
BeginStepTypes::BeginStepSequentially,
BeginStepTypes::BeginStepRandomAccess>;

namespace BeginStepTypes
{
template <typename T, typename... Args>
constexpr auto make(Args &&...args) -> BeginStep
{
return BeginStep{T{std::forward<Args>(args)...}};
}
} // namespace BeginStepTypes

struct DeferredParseAccess
{
/**
Expand All @@ -69,7 +95,7 @@ namespace internal
* (Group- and variable-based parsing shares the same code logic.)
*/
bool fileBased = false;
bool beginStep = false;
BeginStep beginStep = BeginStepTypes::DontBeginStep{};
};

class IterationData : public AttributableData
Expand Down Expand Up @@ -305,7 +331,8 @@ class Iteration : public Attributable
std::string const &filePath,
std::string const &groupPath,
bool beginStep);
void readGorVBased(std::string const &groupPath, bool beginStep);
void readGorVBased(
std::string const &groupPath, internal::BeginStep const &beginStep);
void read_impl(std::string const &groupPath);
void readMeshes(std::string const &meshesPath);
void readParticles(std::string const &particlesPath);
Expand Down
Loading
Loading