Getting multiple forward or adjoint directional derivatives with one call. #1902

chrbertsch · 2023-10-15T17:35:19Z

(From a discussion with @jaeandersson on efficiency improvements for using FMI in optimization problems)

As a very simple example, consider the scalar function f(x) = sin(x). The forward or adjoint directional derivative is df1(x, v) = cos(x)*v. If you can calculate two directional derivatives, you have df2(x, v1, v2) = (cos(x)*v1, cos(x)*v2). df2 is a lot cheaper than two calls to df1, since cos(x) only need to be calculated once and trigonometric functions an order of magnitude more expensive than addition/multiplication. The same will hold for very large ducting from Modelica tools. Keep in mind that the intermediate operations and their linearizations (cos(x) above) usually have to be recalculated between calls, otherwise memory use can blow up.

In terms of API, you could add the support in a backwards compatible way by permitting nSensitivity to be multiples of the total number of knowns in fmi3GetDirectionalDerivative or multiples of the total number of unknowns in fmi3GetAdjointDerivative.

t-sommer · 2023-10-16T09:19:48Z

Would it be possible to calculate and cache the values upon the first invocation of the respective API functions?

jaeandersson · 2023-10-16T10:07:23Z

Would it be possible to calculate and cache the values upon the first invocation of the respective API functions?

Yes, it's possible, but often not desired or practical. It would require the storing of every intermediate operations used for the calculation, which may require too much memory.

jaeandersson · 2023-10-16T10:18:15Z

Would it be possible to calculate and cache the values upon the first invocation of the respective API functions?

Yes, it's possible, but often not desired or practical. It would require the storing of every intermediate operations used for the calculation, which may require too much memory.

To add to this: Think of the case where the BLT sorting in a Modelica tool results in a series of Newton iterations. In the forward/adjoint propagation, you would need to do linear solves using the same linearized system (transposed for adjoints). Multiple forward/adjoint directions just corresponds to multiple right-hand-sides of the linear systems. Working with multiple right-hand-sides is usually more efficient. Also, if you have to redo the Jacobian calculation and/or factorization, this cost will be the same regardless of the number of directions.

HansOlsson · 2023-10-17T12:07:52Z

To explain the proposal I think one should view it as an interface that computes J*V for a matrix V instead of a vector v (and similarly for adjoint), I can see that it might offer some minor advantage (especially for adjoints), but it is not clear to me that it is needed and I could see some alternatives.

To add to this: Think of the case where the BLT sorting in a Modelica tool results in a series of Newton iterations. In the forward/adjoint propagation, you would need to do linear solves using the same linearized system (transposed for adjoints). Multiple forward/adjoint directions just corresponds to multiple right-hand-sides of the linear systems. Working with multiple right-hand-sides is usually more efficient. Also, if you have to redo the Jacobian calculation and/or factorization, this cost will be the same regardless of the number of directions.

However, the Jacobian factorization is normally the same ones as used for computing the values - so either one could cache it from then or one could consider an interface computing both (at least for forward). If the Jacobian is recomputed and refactorized during the solution it's messier - either all factored Jacobians need to be stored (typically less than a handful) or one has to "cheat".

Sending in multiple directions require that all of the directions are known at the same time; whereas caching would handle the different directions being computed in sequence.

If we are concerned that caching takes memory and time when not used, one could consider doing something special before the value call.

chrbertsch · 2023-10-24T13:24:41Z

FMI-Design meeting:
Pierre: How many tools support this?
Torsten: We should start creating a prototype before including it in the standard
Pierre: How many tools are supporting adjoint derivatives?
Christian: New tools come up, for optimization and sensitivity calculation
Karl: This is a tool optimization, one could do it different.
Klaus: If we would see a prototype for an FMU, we can discuss this.
Klaus: this proposal can be also applied to directional derivatives.

jaeandersson · 2023-10-24T14:41:43Z

To take a step back: Fundamentally, what is important is that we can calculate multiple directional derivatives efficiently. If that can be resolved with caching inside the FMUs, doing it in one call is not necessary. I think that there are fundamental reasons to why you can be faster with a single call as opposed to multiple calls (such as being able to use matrix-operations internally and less caching/checkpointing), but caching multiple calls might be fast enough.

chrbertsch · 2024-05-16T09:49:21Z

Comment: this could also been in a layered standard, that defines new additional functions. If it shall be included in a minor version a working group should work on a proposal

FabiJa mentioned this issue Feb 21, 2024

Performance of FMU implementation casadi/casadi#3590

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting multiple forward or adjoint directional derivatives with one call. #1902

Getting multiple forward or adjoint directional derivatives with one call. #1902

chrbertsch commented Oct 15, 2023

t-sommer commented Oct 16, 2023

jaeandersson commented Oct 16, 2023

jaeandersson commented Oct 16, 2023

HansOlsson commented Oct 17, 2023

chrbertsch commented Oct 24, 2023

jaeandersson commented Oct 24, 2023 •

edited

Loading

chrbertsch commented May 16, 2024

Getting multiple forward or adjoint directional derivatives with one call. #1902

Getting multiple forward or adjoint directional derivatives with one call. #1902

Comments

chrbertsch commented Oct 15, 2023

t-sommer commented Oct 16, 2023

jaeandersson commented Oct 16, 2023

jaeandersson commented Oct 16, 2023

HansOlsson commented Oct 17, 2023

chrbertsch commented Oct 24, 2023

jaeandersson commented Oct 24, 2023 • edited Loading

chrbertsch commented May 16, 2024

jaeandersson commented Oct 24, 2023 •

edited

Loading