Gradient Checkpointing in LLMs #794
zslittlehelper
started this conversation in
General
Replies: 1 comment
-
For future ref, it is possible to enable it to save vram. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In Stable Diffusion, gradient checkpointing is almost taken for granted, and is often a must.
In LLM's though, I am uncertain whether it's a concept that either isn't mentioned often, or is so ingrained no one bothers to mention it.
They recently talked about it on Twitter ( https://x.com/prateeky2806/status/1717807126041534921 ) which made me curious how things are in Axolotl?
Beta Was this translation helpful? Give feedback.
All reactions