-
Notifications
You must be signed in to change notification settings - Fork 167
BUG: Disk checkpointing with tape.timestepper()
causes incorrect BC evaluation at taping.
#4206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just to follow on from this, here's my diagnosis of what's happening: We have a few blocks, firedrake/firedrake/adjoint_utils/dirichletbc.py Lines 36 to 43 in a9f2c62
Now, when timestep 1 is due to begin, the self.tape.timesteps[timestep - 1].checkpoint(
_store_checkpointable_state, _store_adj_dependencies, self._global_deps) This process goes through the for var in self.adjoint_dependencies.union(self.checkpointable_state):
self._checkpoint[var] = var.saved_output._ad_create_checkpoint() Let's consider when @property
def saved_output(self):
if self.checkpoint is not None:
return self.output._ad_restore_at_checkpoint(self.checkpoint)
else:
return self.output This has the effect of calling firedrake/firedrake/adjoint_utils/dirichletbc.py Lines 45 to 48 in a9f2c62
Because of the use of diff --git a/pyadjoint/tape.py b/pyadjoint/tape.py
index d860025..58e25af 100644
--- a/pyadjoint/tape.py
+++ b/pyadjoint/tape.py
@@ -826,18 +826,21 @@ class TimeStep(list):
# because the global dependencies do not change.
self._checkpoint[var] = var._checkpoint
else:
- self._checkpoint[var] = var.saved_output._ad_create_checkpoint()
+ var.save_output(overwrite=True)
+ self._checkpoint[var] = var._checkpoint
if adj_dependencies:
if self._revised_adj_deps:
for var in self.adjoint_dependencies:
- self._checkpoint[var] = var.saved_output._ad_create_checkpoint()
+ var.save_output(overwrite=True)
+ self._checkpoint[var] = var._checkpoint
else:
# The adjoint dependencies have not been revised yet. At this stage,
# the block nodes are not marked in the path because the control variable(s)
# are not yet determined.
for var in self.adjoint_dependencies.union(self.checkpointable_state):
- self._checkpoint[var] = var.saved_output._ad_create_checkpoint()
+ var.save_output(overwrite=True)
+ self._checkpoint[var] = var._checkpoint
def restore_from_checkpoint(self, from_storage):
"""Restore the block var checkpoints from the timestep checkpoint.""" But again, I don't understand all the layers of complexity and similarly-named concepts, so this probably subtly breaks something else (adjoints or revolve schedules maybe?)... |
This is my understanding based on some reading of the pyadjoint and Firedrake code, but perhaps others more familiar with the pyadjoint internals might correct:
As an aside: this construction is part of what leads to issue dolfin-adjoint/pyadjoint#169. While I think this means that an adjoint calculation should not modify any value referenced by firedrake/firedrake/adjoint_utils/dirichletbc.py Lines 46 to 47 in 6a5f4b1
with
should workaround the bug -- which it seems to in the reproducer. I'm not sure of the best way to avoid the hack in the above as I'm not very familiar with |
I haven't had time to look at this in detail, but my understanding is that |
When using disk checkpointing with
tape.timestepper()
(i.e., withSingleDiskStorageSchedule
), we observe inconsistent behavior in the forward solve even before invoking any functional or derivative evaluation. This appears specifically when updating aDirichletBC
field during a time-stepping loop.Consider the following reproducer:
would produce inconsistent values:
whereas NOT checkpointing to disk, by using
SingleMemoryStorageSchedule
as scheduler (line 4) gives consistent AND different values:The text was updated successfully, but these errors were encountered: