Add `GradVac` #249

EmileAydar · 2025-02-10T16:59:46Z

[EDIT: This PR is now about GradVac. See comments below]

This pull request introduces a new GradNorm loss-balancing wrapper to the TorchJD library. The key changes are as follows:

A new module (located in torchjd/aggregation/gradnorm_wrapper.py) implements the GradNorm loss-balancing mechanism as described in Chen et al.'s ICML 2018 paper. Unlike other aggregators, this wrapper operates on a list of task losses and computes adaptive loss weights via a learned parameter vector.
The wrapper has been updated to accept a device parameter so that it runs seamlessly on both CPU and CUDA. Corresponding tests now run on multiple devices.

The tests in tests/unit/aggregation/test_gradnorm_wrapper.py have been developped to ensure high coverage as requested, to:

Validate basic functionality (forward pass, error handling, reset behavior).
Verify that the internal loss-scale parameters update correctly (including convergence behavior).
Handle edge cases such as zero gradients.
Integrate with existing TorchJD aggregators (e.g., Sum and MGDA) by re-weighting losses and stacking them into a matrix.
Run on both CPU and CUDA (with necessary adjustments for deterministic algorithm settings on CUDA).
Overall, the tests now achieve ~96% coverage for this module.

The documentation has been updated with detailed docstrings and a usage example showing how to use GradNormWrapper both standalone and in combination with an aggregator like MGDA.

Changelog and Version Update:

The CHANGELOG has been updated in the [Unreleased] section and a new version header ([0.5.1] - 2025-02-10) has been added. The version in pyproject.toml is updated to 0.5.1.

Please review these changes at your convenience. If any further adjustments are needed, feel free to let me know. Once approved, I will merge this PR and update the repository accordingly.

Thank you for your time and consideration!

Best regards,
Emile

for more information, see https://pre-commit.ci

codecov · 2025-02-10T17:01:36Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines	Coverage Δ
src/torchjd/aggregation/gradvac.py	`100.00% <100.00%> (ø)`

ValerianRey · 2025-02-10T22:31:41Z

Thanks for your interest and effort on this PR!

It seems there is a misunderstanding about the role of Aggregators in TorchJD. An Aggregator should be applied to a Jacobian matrix, not a matrix of losses. To properly combine GradNorm with Jacobian-based methods (such as MGDA, UPGrad, etc.), I think the process should be:

Reweight the losses using GradNorm.
Apply torchjd.backward or torchjd.mtl_backward to the reweighted losses. This will compute the Jacobian of these losses and aggregate it using some aggregator.

Do you think your intended use-case would work by following those two steps?

TorchJD is specifically designed to handle step 2, whereas step 1 (reweighting the losses) is independent of the Jacobian descent algorithm. With GradNorm, some gradients (i.e., Jacobian rows) are used to balance the losses, but this balancing also depends on the loss values and some preserved states.

For these reasons, we won’t be able to merge this PR. Additionally, we don’t see TorchJD incorporating methods like GradNorm in the future, as maintaining them falls outside our intended scope and would require more effort than we’re able to commit.

for more information, see https://pre-commit.ci

EmileAydar · 2025-02-11T23:38:43Z

Dear Valerian,

Thank you for your detailed response. I agree with your explanation regarding the role of Aggregators in TorchJD.
My initial proposal for GradNorm was motivated by the need to adaptively balance losses, particularly in scenarios where newly introduced objectives (such as fairness constraints in multiobjective optimization) can naturally be on a smaller scale compared to the primary loss. The idea was to use GradNorm as a pre-aggregation wrapper to reweight the losses and balance their contribution to ensure a fairer aggregation step.

I understand and respect your decision not to integrate GradNorm given that it is not an aggregation heuristic in itself. On a related note, I have been preparing another pull request that focuses on an aggregator implementation, which I believe is more aligned with TorchJD’s intended design.

Thank you again for your prompt and comprehensive feedback. I appreciate the work you and the team do on TorchJD, and I look forward to contributing further in the future.

Warm regards,
Emile

for more information, see https://pre-commit.ci

EmileAydar · 2025-02-12T00:42:58Z

---Update---
Following my previous comment, I have commited the files regarding an aggregator implementation I've been working on for TorchJD: GradVac.

GradVac is an aggregator that modifies gradients when the observed cosine similarity between tasks falls below a desired target. This target can be set as a constant or, it can also be computed adaptively using an exponential moving average scheme if the target is left as None.
In contrast, PCGrad addresses gradient conflicts by simply projecting gradients to remove interference. In fact, when the target in GradVac is set to zero, its behavior essentially replicates that of PCGrad, as PCGrad only removes negative dot products.

Would you prefer that I open a new pull request specifically for GradVac, or should I continue the discussion here to propose merging GradVac in place of GradNorm?

Again, thank you for your time and support !

Warm regards,
Emile

PierreQuinton · 2025-02-12T08:14:41Z

@EmileAydar Thank you for the proposition, we were unaware of the GradVac aggregator, we might consider implementing it in the future. This will require some extra work on our side, for instance determining the properties it satisfies. In the meantime, it seems that you are facing problems of conflict which PCGrad cannot solve (unless there are two objectives), you can try to train using an aggregator that satisfies the non-conflicting property, you can find the list here, I personally would recommend UPGrad or DualProj as the others may not find Pareto optimal points. Another way to pick your aggregator could be to checkout the trajectories on pages 15-16 of Jacobian descent for multi-objective optimization and select the aggregator corresponding to what you feel would be natural in your problem.

We will not have time to add GradVac to the list of aggregator in short terms, so you can also inherit from Aggregator locally and experiment with it, if you get encouraging results, let us know, in particular if it compares well to all aggregators on your task, then this would be relevant information for us!

EmileAydar · 2025-02-15T13:03:38Z

@PierreQuinton Thank you for your detailed reply.

In my experiments, I noticed that :

MGDA is particularly sensitive to small gradients (like those from fairness considerations) as opposed to the other non-conflicting aggregators, which sometimes results in numerical instabilities, manifested by exploding or undefined Jacobian values. Similar issues were observed with PCGrad, while UPGrad demonstrated superior stability.
For one case, I added a small positive epsilon to the diagonal of NashMTL’s gramian. This ensured the matrix remained positive definite, maintained the hard positive weight constraints, and surprisingly improved its capacity to find balanced Pareto tradeoffs.
I’d be glad to share more detailed and quantitative results if you think that would be helpful.

I am also exploring EPO aggregators that integrate a stakeholder preference vector, that I think could be of interest for TorchJD users, especially for engineers in business contexts.
For example, PMGDA shows potential, along with approaches like Exact Pareto Optimization
Exact Pareto Optimization and
Pareto Multi-Task Learning

Thanks also for considering GradVac for a future update. Although I understand it might be reimplemented on your side, I’ve developed an extension based on your PCGrad code and will be using it in my experiments. I’m happy to provide detailed feedback on its performance, and an acknowledgment in your documentation would be appreciated if my contribution proves valuable.

ValerianRey · 2025-02-21T17:29:20Z

Hi @EmileAydar!
Thanks for your work on Gradient Vaccine. It's good to see a first implementation for this!

At the moment, TorchJD only supports stateless (immutable) aggregators. Gradient Vaccine is based on some exponential moving average of the cosine similarities between gradients, which is a state.

We are thinking about adding support for stateful methods in the near future, and Gradient Vaccine could definitely be a good candidate to test this.

So we will keep this PR open for now until we make progress on the stateful structure, and we will get back to it afterwards.

ValerianRey · 2025-02-21T17:59:14Z

I am also exploring EPO aggregators that integrate a stakeholder preference vector, that I think could be of interest for TorchJD users, especially for engineers in business contexts.
For example, PMGDA shows potential, along with approaches like Exact Pareto Optimization
Exact Pareto Optimization and
Pareto Multi-Task Learning

Thanks for these references!

I think Pareto Multi-Task Learning can't be simply integrated into TorchJD (they don't have the same objective as us).

At a first glance, it seems that PMGDA is based on a stateless aggregator, so it should be possible to integrate it into TorchJD. We would however need a more thourough understanding of its theoretical properties before even trying to integrate it.

As for Exact Pareto Optimization, I still have to read the paper. I'll update this comment when I find time for this.

EmileAydar and others added 12 commits February 10, 2025 17:01

GradNorm-Wrapper- Mark1

8548704

GradNorm-Wrapper- Mark1- TestModule

7c14026

CHANGELOG.md updated

f15099e

Update pyproject.toml

6fafe4d

updated examples

5678c7f

updated examples

f08d460

updated examples

6911adf

updated examples

ea4c95f

Create gradnorm_wrapper.rst

cb300c3

Update CHANGELOG.md

4f80184

Update CHANGELOG.md

a3c6544

[pre-commit.ci] auto fixes from pre-commit.com hooks

6013d57

for more information, see https://pre-commit.ci

EmileAydar and others added 6 commits February 12, 2025 00:20

Delete src/torchjd/aggregation/gradnorm_wrapper.py

21345d0

Delete tests/unit/aggregation/test_gradnorm_wrapper.py

d39ded8

tests for gradvac

b3d033d

[pre-commit.ci] auto fixes from pre-commit.com hooks

0141ef3

for more information, see https://pre-commit.ci

GradVac-V1

f77b365

[pre-commit.ci] auto fixes from pre-commit.com hooks

8a7be83

for more information, see https://pre-commit.ci

EmileAydar and others added 9 commits February 12, 2025 00:47

updating doc

36433f3

Create gradvac.rst

a600682

Delete docs/source/docs/aggregation/gradnorm_wrapper.rst

a5bdc6e

updates

a7fe3e2

[pre-commit.ci] auto fixes from pre-commit.com hooks

a7598e6

for more information, see https://pre-commit.ci

updates

1b14111

[pre-commit.ci] auto fixes from pre-commit.com hooks

546f8fd

for more information, see https://pre-commit.ci

updates

b4c7b41

[pre-commit.ci] auto fixes from pre-commit.com hooks

821b1ee

for more information, see https://pre-commit.ci

EmileAydar added 4 commits February 12, 2025 01:17

updates

7cb89c9

updates

c7c7891

updates

250fb02

updates

4ce904e

ValerianRey changed the title ~~GradNorm Loss Balancing Integration in TorchJD~~ Add GradVac Feb 21, 2025

ValerianRey added feat New feature or request package: aggregation labels Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `GradVac` #249

Add `GradVac` #249

EmileAydar commented Feb 10, 2025 •

edited by ValerianRey

Loading

codecov bot commented Feb 10, 2025 •

edited

Loading

ValerianRey commented Feb 10, 2025

EmileAydar commented Feb 11, 2025

EmileAydar commented Feb 12, 2025

PierreQuinton commented Feb 12, 2025 •

edited

Loading

EmileAydar commented Feb 15, 2025 •

edited

Loading

ValerianRey commented Feb 21, 2025

ValerianRey commented Feb 21, 2025

Add GradVac #249

Are you sure you want to change the base?

Add GradVac #249

Conversation

EmileAydar commented Feb 10, 2025 • edited by ValerianRey Loading

codecov bot commented Feb 10, 2025 • edited Loading

Codecov Report

ValerianRey commented Feb 10, 2025

EmileAydar commented Feb 11, 2025

EmileAydar commented Feb 12, 2025

PierreQuinton commented Feb 12, 2025 • edited Loading

EmileAydar commented Feb 15, 2025 • edited Loading

ValerianRey commented Feb 21, 2025

ValerianRey commented Feb 21, 2025

Add `GradVac` #249

Add `GradVac` #249

EmileAydar commented Feb 10, 2025 •

edited by ValerianRey

Loading

codecov bot commented Feb 10, 2025 •

edited

Loading

PierreQuinton commented Feb 12, 2025 •

edited

Loading

EmileAydar commented Feb 15, 2025 •

edited

Loading