Use FGD to fine-tune the transformer #16

cyl943123 · 2023-07-14T09:59:12Z

Hi, Cool Work!

I'm curious about the performance of using the FGD to fine-tune the transformer on GLUE task
do you have done it before?

Thanks!!

belerico · 2023-09-16T20:40:09Z

Hi @cyl943123, nope I haven't tried. I think this will be difficult: even for the simple MNIST a subtle change in the hyperparameters, the learning rate for example, led to instabilities. It will still be nice to see generalization to other tasks. Have you tried something in this regard?

inikishev · 2024-12-31T20:35:49Z

Hi, Cool Work!

I'm curious about the performance of using the FGD to fine-tune the transformer on GLUE task do you have done it before?

Thanks!!

people have fine tuned language models with MEZO (https://arxiv.org/abs/2305.17333), which is an even more trimmed version of this because instead of calculating JVP exactly it estimates it via finite difference (I haven't tried it though)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use FGD to fine-tune the transformer #16

Use FGD to fine-tune the transformer #16

cyl943123 commented Jul 14, 2023 •

edited

Loading

belerico commented Sep 16, 2023

inikishev commented Dec 31, 2024 •

edited

Loading

Use FGD to fine-tune the transformer #16

Use FGD to fine-tune the transformer #16

Comments

cyl943123 commented Jul 14, 2023 • edited Loading

belerico commented Sep 16, 2023

inikishev commented Dec 31, 2024 • edited Loading

cyl943123 commented Jul 14, 2023 •

edited

Loading

inikishev commented Dec 31, 2024 •

edited

Loading