Gradient Checkpointing in LLMs #794

zslittlehelper · 2023-10-27T15:37:41Z

zslittlehelper
Oct 27, 2023

In Stable Diffusion, gradient checkpointing is almost taken for granted, and is often a must.

In LLM's though, I am uncertain whether it's a concept that either isn't mentioned often, or is so ingrained no one bothers to mention it.

They recently talked about it on Twitter ( https://x.com/prateeky2806/status/1717807126041534921 ) which made me curious how things are in Axolotl?

NanoCode012 · 2024-02-23T18:00:23Z

For future ref, it is possible to enable it to save vram. gradient_checkpointing: true

0 replies