Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF5: Explicit control over chunking #1591

Merged
merged 9 commits into from
Feb 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/backends/hdf5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ Any file object greater than or equal in size to threshold bytes will be aligned

``OPENPMD_HDF5_CHUNKS``: this sets defaults for data chunking via `H5Pset_chunk <https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_chunk.htm>`__.
Chunking generally improves performance and only needs to be disabled in corner-cases, e.g. when heavily relying on independent, parallel I/O that non-collectively declares data records.
The chunk size can alternatively (or additionally) be specified explicitly per dataset, by specifying a dataset-specific chunk size in the JSON/TOML configuration of ``resetDataset()``/``reset_dataset()``.

``OPENPMD_HDF5_COLLECTIVE_METADATA``: this is an option to enable collective MPI calls for HDF5 metadata operations via `H5Pset_all_coll_metadata_ops <https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetAllCollMetadataOps>`__ and `H5Pset_coll_metadata_write <https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetCollMetadataWrite>`__.
By default, this optimization is enabled as it has proven to provide performance improvements.
Expand Down
5 changes: 4 additions & 1 deletion docs/source/details/backendconfig.rst
Original file line number Diff line number Diff line change
Expand Up @@ -183,12 +183,15 @@ A full configuration of the HDF5 backend:
.. literalinclude:: hdf5.json
:language: json

All keys found under ``hdf5.dataset`` are applicable globally (future: as well as per dataset).
All keys found under ``hdf5.dataset`` are applicable globally as well as per dataset.
Explanation of the single keys:

* ``hdf5.dataset.chunks``: This key contains options for data chunking via `H5Pset_chunk <https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_chunk.htm>`__.
The default is ``"auto"`` for a heuristic.
``"none"`` can be used to disable chunking.

An explicit chunk size can be specified as a list of positive integers, e.g. ``hdf5.dataset.chunks = [10, 100]``. Note that this specification should only be used per-dataset, e.g. in ``resetDataset()``/``reset_dataset()``.

Chunking generally improves performance and only needs to be disabled in corner-cases, e.g. when heavily relying on independent, parallel I/O that non-collectively declares data records.
* ``hdf5.vfd.type`` selects the HDF5 virtual file driver.
Currently available are:
Expand Down
8 changes: 7 additions & 1 deletion examples/5_write_parallel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ type = "subfiling"
ioc_selection = "every_nth_rank"
stripe_size = 33554432
stripe_count = -1

[hdf5.dataset]
chunks = "auto"
)";

// open file for writing
Expand Down Expand Up @@ -81,7 +84,10 @@ stripe_count = -1
// example 1D domain decomposition in first index
Datatype datatype = determineDatatype<float>();
Extent global_extent = {10ul * mpi_size, 300};
Dataset dataset = Dataset(datatype, global_extent);
Dataset dataset = Dataset(datatype, global_extent, R"(
[hdf5.dataset]
chunks = [10, 100]
)");

if (0 == mpi_rank)
cout << "Prepared a Dataset of size " << dataset.extent[0] << "x"
Expand Down
2 changes: 1 addition & 1 deletion include/openPMD/IO/HDF5/HDF5IOHandlerImpl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -118,9 +118,9 @@ class HDF5IOHandlerImpl : public AbstractIOHandlerImpl
#endif

json::TracingJSON m_config;
std::optional<nlohmann::json> m_buffered_dataset_config;

private:
std::string m_chunks = "auto";
struct File
{
std::string name;
Expand Down
4 changes: 4 additions & 0 deletions include/openPMD/auxiliary/JSON_internal.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ namespace json
* @return nlohmann::json const&
*/
nlohmann::json const &getShadow() const;
nlohmann::json &getShadow();

/**
* @brief Invert the "shadow", i.e. a copy of the original JSON value
Expand Down Expand Up @@ -247,5 +248,8 @@ namespace json
*/
nlohmann::json &
merge(nlohmann::json &defaultVal, nlohmann::json const &overwrite);

nlohmann::json &filterByTemplate(
nlohmann::json &defaultVal, nlohmann::json const &positiveMask);
} // namespace json
} // namespace openPMD
Loading
Loading