FineTrainers is a work-in-progress library to support (accessible) training of video models. Our first priority is to support LoRA training for all popular video models in Diffusers, and eventually other methods like controlnets, control-loras, distillation, etc.
cogvideox-factory
was renamed to finetrainers
. If you're looking to train CogVideoX or Mochi with the legacy training scripts, please refer to this README instead. Everything in the training/
directory will be eventually moved and supported under finetrainers
.
CogVideoX-LoRA.mp4 |
- 🔥 2025-02-12: We have shipped a set of tooling to curate small and high-quality video datasets for fine-tuning. See datasets documentation page for details!
- 🔥 2025-02-12: Check out eisneim/ltx_lora_training_i2v_t2v! It builds off of
finetrainers
to support image to video training for LTX-Video and STG guidance for inference. - 🔥 2025-01-15: Support for naive FP8 weight-casting training added! This allows training HunyuanVideo in under 24 GB upto specific resolutions.
- 🔥 2025-01-13: Support for T2V full-finetuning added! Thanks to @ArEnSc for taking up the initiative!
- 🔥 2025-01-03: Support for T2V LoRA finetuning of CogVideoX added!
- 🔥 2024-12-20: Support for T2V LoRA finetuning of Hunyuan Video added! We would like to thank @SHYuanBest for his work on a training script here.
- 🔥 2024-12-18: Support for T2V LoRA finetuning of LTX Video added!
Clone the repository and make sure the requirements are installed: pip install -r requirements.txt
and install diffusers
from source by pip install git+https://github.com/huggingface/diffusers
. The requirements specify diffusers>=0.32.1
, but it is always recommended to use the main
branch of Diffusers for the latest features and bugfixes. Note that the main
branch for finetrainers
is also the development branch, and stable support should be expected from the release tags.
Checkout to the latest release tag:
git fetch --all --tags
git checkout tags/v0.0.1
Follow the instructions mentioned in the README for the release tag.
To get started quickly with example training scripts on the main development branch, refer to the following:
The following are some simple datasets/HF orgs with good datasets to test training with quickly:
- Disney Video Generation Dataset
- bigdatapw Video Dataset Collection
- Finetrainers HF Dataset Collection
Please checkout docs/models
and examples/training
to learn more about supported models for training & example reproducible training launch scripts.
Important
It is recommended to use Pytorch 2.5.1 or above for training. Previous versions can lead to completely black videos, OOM errors, or other issues and are not tested.
Note
The following numbers were obtained from the release branch. The main
branch is unstable at the moment and may use higher memory.
Model Name | Tasks | Min. LoRA VRAM* | Min. Full Finetuning VRAM^ |
---|---|---|---|
LTX-Video | Text-to-Video | 5 GB | 21 GB |
HunyuanVideo | Text-to-Video | 32 GB | OOM |
CogVideoX-5b | Text-to-Video | 18 GB | 53 GB |
*Noted for training-only, no validation, at resolution 49x512x768
, rank 128, with pre-computation, using FP8 weights & gradient checkpointing. Pre-computation of conditions and latents may require higher limits (but typically under 16 GB).
^Noted for training-only, no validation, at resolution 49x512x768
, with pre-computation, using BF16 weights & gradient checkpointing.
If you would like to use a custom dataset, refer to the dataset preparation guide here.
Checkout some amazing projects citing finetrainers
:
- SkyworkAI's SkyReels-A1
- eisneim's LTX Image-to-Video
- wileewang's TransPixar
- Feizc's Video-In-Context
Checkout the following UIs built for finetrainers
:
finetrainers
builds on top of & takes inspiration from great open-source libraries -transformers
,accelerate
,torchtune
,torchtitan
,peft
,diffusers
,bitsandbytes
,torchao
anddeepspeed
- to name a few.- Some of the design choices of
finetrainers
were inspired bySimpleTuner
.