-
Notifications
You must be signed in to change notification settings - Fork 7
Add ADIOS2 IO checkpoint support #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 2 out of 16 changed files in this pull request and generated no comments.
Files not reviewed (14)
- CMakeLists.txt: Language not supported
- cmake/adios2/FindADIOS2.cmake: Language not supported
- cmake/adios2/downloadBuildAdios2.cmake.in: Language not supported
- docs/source/user/advanced_build.rst: Language not supported
- docs/source/user/index.rst: Language not supported
- docs/source/user/input_file.rst: Language not supported
- src/CMakeLists.txt: Language not supported
- src/adios2/checkpoint_dummy.f90: Language not supported
- src/case/base_case.f90: Language not supported
- src/common.f90: Language not supported
- src/config.f90: Language not supported
- src/solver.f90: Language not supported
- tests/CMakeLists.txt: Language not supported
- tests/test_adios2_read_write.f90: Language not supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick review an focused mainly on minor things, API, and general strategy. I'll do another one focusing on the design in the following days. For now I think the separation between adios2 and the checkpoint manager is very good in terms of design and can potantially allow a separate writer than adios2 in the future.
@@ -90,10 +95,20 @@ subroutine case_init(self, backend, mesh, host_allocator) | |||
|
|||
self%solver = init(backend, mesh, host_allocator) | |||
|
|||
call self%initial_conditions() | |||
self%checkpoint_mgr = create_checkpoint_manager(MPI_COMM_WORLD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add the following in m_checkpoint_manager
interface checkpoint_manager_t
module procedure create_checkpoint_manager
end interface checkpoint_manager_t
And then this line would be
self%checkpoint_mgr = checkpoint_manager_t(MPI_COMM_WORLD)
and help us avoid use m_checkpoint_manager, only :: create_checkpoint_manager
bit.
call self%checkpoint_mgr%handle_restart(self%solver, MPI_COMM_WORLD) | ||
|
||
if (.not. self%checkpoint_mgr%is_restart()) call self%initial_conditions() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like handle_restart
does nothing if the simulation is not restarting, so it will be better to have these two in a single if else block based on is_restart()
.
@@ -123,6 +124,7 @@ function init(backend, mesh, host_allocator) result(solver) | |||
solver%n_iters = solver_cfg%n_iters | |||
solver%n_output = solver_cfg%n_output | |||
solver%ngrid = product(solver%mesh%get_global_dims(VERT)) | |||
solver%current_iter = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is already initialised to 0 when declared so this line can be removed.
end if | ||
end if | ||
|
||
file = self%adios2_writer%open(filename, adios2_mode_write, comm_to_use) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the file overwritten with this call or at least the data inside would be overwritten with the following write_data
calls? I'm wondering because just above you inquire if the file exists and delete manually if so.
Also this is what we should do later on but @slaizet was mentioning something like keeping the last checkpoint until a new one is successfully written to make sure if something goes wrong with writing after deleting we don't lose it.
call self%generate_coordinates( & | ||
solver, file, shape_dims, start_dims, count_dims & | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid writing coordinates to each snapshot file? We have u, v, w and together with this x, y, z coordinates it would double the total size of a snapshot. With structured format paraview only requires 3x 1D arrays with dimensions (nx), (ny), (nz) individually to construct the entire field, saving quite a lot compared to a (nx, ny, nz) sized array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also allocation/deallocation inside generate_coordinates
would kick the memory requirement +3 units each time we write and this can trigger out of memory if the simulation barely fits.
do i_field = 1, size(field_names) | ||
host_field => get_field_data(solver, trim(field_names(i_field))) | ||
if (.not. associated(host_field)) cycle | ||
|
||
call write_single_field(trim(field_names(i_field)), host_field) | ||
end do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to change the arguments of write_fields
subroutine a little to make it more functional, for example when outputting a field other than u/v/w
. Instead of passing solver and implicitly handling u/v/w
via solver%u/v/w
, we can pass an array of field_t
, sized same as field_names
such that they correspond to the name of each individual field. Then it will be easy to output scalar fields or any other field we're interested in just by changing the arguments.
This PR add checkpoint/restart and snapshot functionality to x3d2 using ADIOS2. This enables efficient parallel I/O for large-scale simulations.
Implementation details:
Build system changes:
ENABLE_ADIOS2
CMake optionUSE_SYSTEM_ADIOS2
to control using system vs. built-in ADIOS2Future work: