Skip to content

Commit

Permalink
HDF5: Explicit control over chunking (#1591)
Browse files Browse the repository at this point in the history
* Chunking specification per dataset, explicit specification

Still need to filter out the warnings better

* Json internal

* Properly warn about unused items

* Maybe expose this publicly?

* CI Fixes

* Documentation

* Testing

* Revert "Maybe expose this publicly?"

This reverts commit f00baa7.

* Remove todo comment
  • Loading branch information
franzpoeschel authored Feb 26, 2024
1 parent a0eca32 commit 30e5bde
Show file tree
Hide file tree
Showing 8 changed files with 277 additions and 111 deletions.
1 change: 1 addition & 0 deletions docs/source/backends/hdf5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ Any file object greater than or equal in size to threshold bytes will be aligned

``OPENPMD_HDF5_CHUNKS``: this sets defaults for data chunking via `H5Pset_chunk <https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_chunk.htm>`__.
Chunking generally improves performance and only needs to be disabled in corner-cases, e.g. when heavily relying on independent, parallel I/O that non-collectively declares data records.
The chunk size can alternatively (or additionally) be specified explicitly per dataset, by specifying a dataset-specific chunk size in the JSON/TOML configuration of ``resetDataset()``/``reset_dataset()``.

``OPENPMD_HDF5_COLLECTIVE_METADATA``: this is an option to enable collective MPI calls for HDF5 metadata operations via `H5Pset_all_coll_metadata_ops <https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetAllCollMetadataOps>`__ and `H5Pset_coll_metadata_write <https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetCollMetadataWrite>`__.
By default, this optimization is enabled as it has proven to provide performance improvements.
Expand Down
5 changes: 4 additions & 1 deletion docs/source/details/backendconfig.rst
Original file line number Diff line number Diff line change
Expand Up @@ -183,12 +183,15 @@ A full configuration of the HDF5 backend:
.. literalinclude:: hdf5.json
:language: json

All keys found under ``hdf5.dataset`` are applicable globally (future: as well as per dataset).
All keys found under ``hdf5.dataset`` are applicable globally as well as per dataset.
Explanation of the single keys:

* ``hdf5.dataset.chunks``: This key contains options for data chunking via `H5Pset_chunk <https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_chunk.htm>`__.
The default is ``"auto"`` for a heuristic.
``"none"`` can be used to disable chunking.

An explicit chunk size can be specified as a list of positive integers, e.g. ``hdf5.dataset.chunks = [10, 100]``. Note that this specification should only be used per-dataset, e.g. in ``resetDataset()``/``reset_dataset()``.

Chunking generally improves performance and only needs to be disabled in corner-cases, e.g. when heavily relying on independent, parallel I/O that non-collectively declares data records.
* ``hdf5.vfd.type`` selects the HDF5 virtual file driver.
Currently available are:
Expand Down
8 changes: 7 additions & 1 deletion examples/5_write_parallel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ type = "subfiling"
ioc_selection = "every_nth_rank"
stripe_size = 33554432
stripe_count = -1
[hdf5.dataset]
chunks = "auto"
)";

// open file for writing
Expand Down Expand Up @@ -81,7 +84,10 @@ stripe_count = -1
// example 1D domain decomposition in first index
Datatype datatype = determineDatatype<float>();
Extent global_extent = {10ul * mpi_size, 300};
Dataset dataset = Dataset(datatype, global_extent);
Dataset dataset = Dataset(datatype, global_extent, R"(
[hdf5.dataset]
chunks = [10, 100]
)");

if (0 == mpi_rank)
cout << "Prepared a Dataset of size " << dataset.extent[0] << "x"
Expand Down
2 changes: 1 addition & 1 deletion include/openPMD/IO/HDF5/HDF5IOHandlerImpl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -118,9 +118,9 @@ class HDF5IOHandlerImpl : public AbstractIOHandlerImpl
#endif

json::TracingJSON m_config;
std::optional<nlohmann::json> m_buffered_dataset_config;

private:
std::string m_chunks = "auto";
struct File
{
std::string name;
Expand Down
4 changes: 4 additions & 0 deletions include/openPMD/auxiliary/JSON_internal.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ namespace json
* @return nlohmann::json const&
*/
nlohmann::json const &getShadow() const;
nlohmann::json &getShadow();

/**
* @brief Invert the "shadow", i.e. a copy of the original JSON value
Expand Down Expand Up @@ -247,5 +248,8 @@ namespace json
*/
nlohmann::json &
merge(nlohmann::json &defaultVal, nlohmann::json const &overwrite);

nlohmann::json &filterByTemplate(
nlohmann::json &defaultVal, nlohmann::json const &positiveMask);
} // namespace json
} // namespace openPMD
Loading

0 comments on commit 30e5bde

Please sign in to comment.