Skip to content

Generic way of handling nested derivatives #762

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mattsignorelli opened this issue Mar 30, 2025 · 3 comments
Open

Generic way of handling nested derivatives #762

mattsignorelli opened this issue Mar 30, 2025 · 3 comments
Labels
backend Related to one or more autodiff backends core Related to the core utilities of the package wontfix This will not be worked on

Comments

@mattsignorelli
Copy link
Contributor

mattsignorelli commented Mar 30, 2025

GTPSA, designed for high order AD using the truncated power series TPS structure, does not support nesting of derivatives. This is intentional because at high orders it is it is always faster to use a TPS with higher order truncation order, than to nest TPSs of low truncation orders. However, this can make it challenging to compute in a backend agnostic way nested derivatives. Consider the following scenario:

using DifferentiationInterface, ForwardDiff, GTPSA

function foo(x0, k)
  x_1 = 1*x0[1] + 2*x0[2] + 5*k*x0[1] + 6*k*x0[2]
  x_2 = 3*x0[1] + 4*x0[2] + 7*k*x0[1] + 8*k*x0[2]
  return [x_1, x_2]
end

k = 1
x0 = [0, 0]

x = foo(x0, k)
# Now, suppose we want the dependence of x on the parameter k:
# Using ForwardDiff:
derivative(k->jacobian(xk->foo(xk[1:2],xk[3]), AutoForwardDiff(), [0,0,k]), AutoForwardDiff(), 0)
#= Output:
2×3 Matrix{Int64}:
 5  6  0
 7  8  0
=#


# To use GTPSA (not possible now using Differentiation interface):
const D = Descriptor(3,2) # 3 variables, 2nd order
Δx = @vars(D)[1:2]
Δk = @vars(D)[3]
x_TPS = foo(x0+Δx, k+Δk)

# GTPSA.jacobian, just extracts the coefficients of the TPS, while derivative 
# returns a TPS after taking a derivative wrt variable 3
GTPSA.jacobian(GTPSA.deriv.(x_TPS, 3))
#= Output:
2×3 Matrix{Float64}:
 5.0  6.0  0.0
 7.0  8.0  0.0
=#

One option I imagine that could work is to use a flag for the internal jacobian, specifying that there is nesting. Then, for nested Jacobian, instead of return a matrix of Float64s, GTPSA would return a matrix of its TPS type, which then the outer jacobian can extract derivatives from. Though this solution would require the outermost jacobian to know the number of internal derivatives/order at which to truncated the TPS at, which is not the most generic. I wonder if there has been any thought or work towards finding a generic way to handle nested derivatives?

@gdalle
Copy link
Member

gdalle commented Mar 30, 2025

Though this solution would require the outermost jacobian to know the number of internal derivatives/order at which to truncated the TPS at, which is not the most generic.

You hit the nail on the head with this remark. In general, we can't anticipate how many layers of nesting there will be, or with respect to which variables users will choose to differentiate some code. I know that Lux.jl has some clever utilities for this (https://lux.csail.mit.edu/stable/manual/nested_autodiff) but they're mostly hacks enabled by the existence of a known abstract type for layers (correct me if I'm wrong @avikpal).
To sum up, I don't think there is a good way to implement nesting in DI.

@mattsignorelli
Copy link
Contributor Author

mattsignorelli commented Mar 30, 2025

To sum up, I don't think there is a good way to implement nesting in DI.

Makes sense. Yes, maybe one option would be to allow a flag to return the "unprocessed" version of the derivatives. E.g. for ForwardDiff, all of the Dual numbers including the nested duals themselves, and for GTPSA I could return the TPSs. Then we could use some generic "getter" functions to extract certain derivatives from the unprocessed versions, like my GTPSA.jacobian above. But I'm admittedly not super familiar with how the other backends work in practice, so not sure if that is even doable or not

In accelerator physics our bread and butter is looking at how small variations to parameters affect the transport map (which to first order is just a Jacobian of the particle coordinates through the machine). So with this flag set, the Jacobian would be a matrix of TPS types, instead of just scalars, for example. The AutoGTPSA type allows one to specify the truncation order manually, so that could be set per use case.

@gdalle
Copy link
Member

gdalle commented Mar 31, 2025

Honestly I don't think this is within the scope of DI at the moment. The goal is to provide a set of generic utilities, but higher-order autodiff is useful to much fewer people than what is currently in the package. I'm leaving the issue open but I have no plans to work on this, nor would I know how to achieve it.

@gdalle gdalle added wontfix This will not be worked on backend Related to one or more autodiff backends core Related to the core utilities of the package labels Mar 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to one or more autodiff backends core Related to the core utilities of the package wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants