Releases: a-r-r-o-w/finetrainers
v0.2.0
Finetrainers v0.2.0 🧪
New trainers
- Channel concatenated control conditioning for Wan2.1 and CogView4
Wan image-conditioning on T2V model |
---|
wan-t2v-image-conditioning.mp4 |
CogView4 control conditioning (Edit + Canny) |
![]() |
The training involves adding extra input channels to the patch embedding layer (referred to as the "control injection" layer in finetrainers), to mix conditioning features into the latent stream. This architecture choice is very common and has been seen before in many models - CogVideoX-I2V, HunyuanVideo-I2V, Alibaba's Fun Control models, etc. Due to the popularity and simplicity in the architecture choice, it is a good choice to support standalone as a trainer.
import torch
from diffusers import CogView4Pipeline
from diffusers.utils import load_image
from finetrainers.models.utils import _expand_linear_with_zeroed_weights
from finetrainers.patches import load_lora_weights
from finetrainers.patches.dependencies.diffusers.control import control_channel_concat
dtype = torch.bfloat16
device = torch.device("cuda")
generator = torch.Generator().manual_seed(0)
pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=dtype)
in_channels = pipe.transformer.config.in_channels
patch_channels = pipe.transformer.patch_embed.proj.in_features
pipe.transformer.patch_embed.proj = _expand_linear_with_zeroed_weights(pipe.transformer.patch_embed.proj, new_in_features=2 * patch_channels)
load_lora_weights(pipe, "finetrainers/CogView4-6B-Edit-LoRA-v0", "cogview4-lora")
pipe.set_adapters("cogview4-lora", 0.9)
pipe.to(device)
prompt = "Make the image look like it's from an ancient Egyptian mural."
control_image = load_image("examples/training/control/cogview4/omni_edit/validation_dataset/0.png")
height, width = 1024, 1024
with torch.no_grad():
latents = pipe.prepare_latents(1, in_channels, height, width, dtype, device, generator)
control_image = pipe.image_processor.preprocess(control_image, height=height, width=width)
control_image = control_image.to(device=device, dtype=dtype)
control_latents = pipe.vae.encode(control_image).latent_dist.sample(generator=generator)
control_latents = (control_latents - pipe.vae.config.shift_factor) * pipe.vae.config.scaling_factor
with control_channel_concat(pipe.transformer, ["hidden_states"], [control_latents], dims=[1]):
image = pipe(prompt, latents=latents, num_inference_steps=30, generator=generator).images[0]
image.save("output.png")
New models supported
- FLUX.1-dev
- Wan2.1 I2V
Find example training configs here.
Attention
Support for multiple different attention providers for training and inference - Pytorch native, flash-attn
, sageattention
, xformers
, flex
. See docs for more details.
Other major changes
- Better regional compilation support
What's Changed
- Update project showcase by @a-r-r-o-w in #355
- Flux ModelSpec by @a-r-r-o-w in #358
- Pytorch regional compilation by @a-r-r-o-w in #361
- [Doc] Fix a typo of
flux.md
by @DarkSharpness in #363 - [Fix] Raise ValueError proactively before some confusing errors occur due to wrong input image size by @DarkSharpness in #364
- Improve webdataset caption loading by @a-r-r-o-w in #365
- fix string matching for blocks by @neph1 in #360
- Bump ruff version by @a-r-r-o-w in #367
- Channel-concatenated Control Trainer by @a-r-r-o-w in #310
- Fix #352: FSDP2 argument typo by @a-r-r-o-w in #370
- Support Wan I2V; Better regional compile support by @a-r-r-o-w in #375
- chore: save all weights with step-specific directories by @Leojc in #379
- fix: lora loading for final validation by @Leojc in #382
- Fix posterior computation and control tests by @a-r-r-o-w in #384
- Support flash/flex/xformers/sage attention by @a-r-r-o-w in #377
New Contributors
- @DarkSharpness made their first contribution in #363
Full Changelog: v0.1.0...v0.2.0
v0.1.0
Finetrainers v0.1.0 🧪
output.mp4
New models supported
- CogView4
- Wan 2.1
Checkpoints released
- https://huggingface.co/finetrainers/Wan2.1-T2V-1.3B-crush-smol-v0
- https://huggingface.co/finetrainers/Wan2.1-T2V-1.3B-3dgs-v0
- https://huggingface.co/finetrainers/CogView4-6B-rider-waite-tarot-v0
- https://huggingface.co/finetrainers/CogView4-6B-rider-waite-tarot-v0-shifted-sigmas
Other major changes
- Fully support for Accelerate as a parallelization backend again
- Opt-in precomputation. Helpful for smaller datasets
- Better remote dataset loading support
- Support for
datasets>=3.4.0
- Bug fix: Layerwise Casting now works with Wan
- Bug fix: LTX Video training now works with batch_size > 1
What's Changed
- calling save_model_card correctly by @vivkul in #275
- 3D Parallel + Model Spec API by @a-r-r-o-w in #245
- CogVideoX ModelSpec by @a-r-r-o-w in #280
- Remove unused CogVideoX code by @a-r-r-o-w in #282
- Wan T2V ModelSpec by @a-r-r-o-w in #281
- update docs by @a-r-r-o-w in #284
- Improve local dataset loading by @a-r-r-o-w in #289
- Add 3DGS dataset example for Wan by @a-r-r-o-w in #290
- HunyuanVideo ModelSpec by @a-r-r-o-w in #287
- Fix full rank config by @a-r-r-o-w in #292
- update docs by @a-r-r-o-w in #294
- CogView4 ModelSpec by @a-r-r-o-w in #297
- PyPI package by @a-r-r-o-w in #298
- Minor cleanup and doc updates by @a-r-r-o-w in #304
- Remove forceful precomputation behaviour; Improvements to data loading by @a-r-r-o-w in #303
- Webdataset improvements; CogView4 example with The Simpsons webdataset by @a-r-r-o-w in #305
- Fix for #306 by @a-r-r-o-w in #307
- Fix Wan scaling due to upstream changes by @a-r-r-o-w in #308
- Fix CogVideoX config error for invert_scale_latents by @a-r-r-o-w in #309
- add valid names to dataset docs by @neph1 in #318
- Fix Layerwise Casting by @a-r-r-o-w in #316
- Fix enable_model_cpu_offload problems by @a-r-r-o-w in #320
- Patch WanTimeTextImageEmbedding forward only with fp8 by @a-r-r-o-w in #327
- Lower memory requirements on single GPU by @a-r-r-o-w in #321
- Respect report_to if set to none by @a-r-r-o-w in #331
- Cleanup model load methods by @a-r-r-o-w in #333
- add init.py to sft_trainer/ to fix #335 by @jbilcke-hf in #336
- Add version guard for datasets 3.4.0; Fix argument documentation by @a-r-r-o-w in #332
- Update dataset.py by @a-r-r-o-w in #337
- Remove hard-coded batch size configuration in LTXModel by @Linyou in #340
- Fix missing comma in loading hunyuan video tokenizer by @a-r-r-o-w in #341
- Add back accelerate compatibility by @a-r-r-o-w in #339
- Relative -> Absolute imports by @a-r-r-o-w in #343
- Prepare for v0.1.0 release by @a-r-r-o-w in #322
New Contributors
- @vivkul made their first contribution in #275
- @neph1 made their first contribution in #318
- @jbilcke-hf made their first contribution in #336
- @Linyou made their first contribution in #340
Full Changelog: v0.0.1...v0.1.0
v0.0.1
FineTrainers v0.0.1 🧪
FineTrainers is a work-in-progress library to support (accessible) training of diffusion models. The following models are currently supported (based on Diffusers):
- CogVideoX T2V (versions 1.0 and 1.5)
- LTX Video
- Hunyuan Video
The legacy/deprecated scripts also support CogVideoX I2V and Mochi.
Currently, LoRA and Full-rank finetuning is supported. With time, more models and training techniques will be supported. We thank our many contributors for their amazing work at improve finetrainers
. They are mentioned below in the "New Contributors" section.
In a short timespan, finetrainers has found its way into multiple research works, which has been a very motivating factor for us. They are mentioned in the "Featured Projects" section of the README. We hope you find them interesting and continue to build & work on interesting ideas, while sharing your research artifacts openly!
Some artifacts that we've released ourselves is available here: https://huggingface.co/finetrainers
We plan to focus on core algorithms/models that users prefer to have support for quickly, primarily based on the feedback we've received (thanks to everyone who's spoken with me regarding this. Your time is invaluable!) The majors asks are:
- more models and faster support for newer models (we will open this up for contributions after a major open but pending PR, and add many ourselves!)
- compatibility with UIs that do not support standardized implementations from diffusers (we will write two-way conversion scripts for new models that are added to diffusers, so that it is easy to obtain original-format weights from diffusers-format weights)
- more algorithms (Control LoRA, ControlNets for video models and VideoJAM are some of the highly asked techniques -- we will prioritize this!)
- Dataset QoL changes (this is a WIP in an open but pending PR)
Let us know what you'd like to see next & stay tuned for interesting updates!
output.mp4 |
What's Changed
- CogVideoX LoRA and full finetuning by @a-r-r-o-w in #1
- Low-bit memory optimizers, CpuOffloadOptimizer, Memory Reports by @a-r-r-o-w in #3
- Pin memory support in dataloader by @a-r-r-o-w in #5
- DeepSpeed fixes by @a-r-r-o-w in #7
- refactor readme i. by @sayakpaul in #8
- DeepSpeed and DDP Configs by @a-r-r-o-w in #10
- Full finetuning memory requirements by @a-r-r-o-w in #9
- Multi-GPU parallel encoding support for training videos. by @zRzRzRzRzRzRzR in #6
- CogVideoX I2V; CPU offloading; Model README descriptions by @a-r-r-o-w in #11
- add VideoDatasetWithResizeAndRectangleCrop dataset resize crop by @glide-the in #13
- add "max_sequence_length": model_config.max_text_seq_length, by @glide-the in #15
- readme updates + refactor by @a-r-r-o-w in #14
- Update README.md by @a-r-r-o-w in #17
- merge by @zRzRzRzRzRzRzR in #18
- Darft of Chinese README by @zRzRzRzRzRzRzR in #19
- docs: update README.md by @eltociear in #21
- Update requirements.txt (fixed typo) by @Nojahhh in #24
- Update README and Contribution guide by @zRzRzRzRzRzRzR in #20
- Lower requirements versions by @a-r-r-o-w in #27
- Update for windows compability by @Nojahhh in #32
- [Docs] : Update README.md by @FarukhS52 in #35
- Improve dataset preparation support + multiresolution prep by @a-r-r-o-w in #39
- Update prepare_dataset.sh by @a-r-r-o-w in #42
- improve dataset preparation by @a-r-r-o-w in #43
- more dataset fixes by @a-r-r-o-w in #49
- fix: correct type in .py files by @DhanushNehru in #52
- fix: resuming from a checkpoint when using deepspeed. by @sayakpaul in #38
- Windows support for T2V scripts by @a-r-r-o-w in #48
- Fixed optimizers parsing error in bash scripts by @Nojahhh in #61
- Update readme to install diffusers from source by @Yuancheng-Xu in #59
- Update README.md by @a-r-r-o-w in #73
- add some script of lora test by @zRzRzRzRzRzRzR in #66
- I2V multiresolution finetuning by removing learned PEs by @a-r-r-o-w in #31
- adaption for CogVideoX1.5 by @jiashenggu in #92
- docs: fix help message in args.py by @Leojc in #98
- sft with multigpu by @zhipuch in #84
- [feat] add Mochi-1 trainer by @sayakpaul in #90
- wandb tracker in scheduling problems during the training initiation and training stages by @glide-the in #100
- fix format specifier. by @sayakpaul in #104
- Unbound fix by @glide-the in #105
- feat: support checkpointing saving and loading by @sayakpaul in #106
- RoPE fixes for 1.5, bfloat16 support in prepare_dataset, gradient_accumulation grad norm undefined fix by @a-r-r-o-w in #107
- Update README.md to include mochi-1 trainer by @sayakpaul in #112
- add I2V sft and fix an error by @jiashenggu in #97
- LTX Video by @a-r-r-o-w in #123
- Hunyuan Video LoRA by @a-r-r-o-w in #126
- Precomputation of conditions and latents by @a-r-r-o-w in #129
- Grad Norm tracking in DeepSpeed by @a-r-r-o-w in #148
- fix validation bug by @a-r-r-o-w in #149
- [feat] support DeepSpeed. by @sayakpaul in #139
- [optimization] support 8bit optims from bistandbytes by @sayakpaul in #163
- [Chore] bulk update styling and formatting by @sayakpaul in #170
- Update README.md to fix graph paths by @sayakpaul in #171
- Support CogVideoX T2V by @sayakpaul in #165
- Fix scheduler bugs by @sayakpaul in #177
- scheduler fixes part ii by @sayakpaul in #178
- [CI] add a workflow to do quality checks. by @sayakpaul in #180
- support model cards by @sayakpaul in #176
- [docs] refactor docs for easier info parsing by @sayakpaul in #175
- Allow images; Remove LLM generated prefixes; Allow JSON/JSONL; Fix bugs by @a-r-r-o-w in #158
- simplify docs part ii by @sayakpaul in #190
- Update requirements by @a-r-r-o-w in #189
- Fix minor bug with function call that doesn't exist. by @ArEnSc in #195
- Precomputation folder name based on model name by @a-r-r-o-w in #196
- Better defaults for LTXV by @a-r-r-o-w in #198
- [core] Fix loading of precomputed conditions and latents by @sayakpaul in #199
- Epoch loss by @a-r-r-o-w in #201
- Shell script to minimally test supported models on a real dataset by @sayakpaul in #204
- Update pr_tests.yml to update ruff version by @sayakpaul in #205
- Fix the checkpoint dir bug in
get_intermediate_ckpt_path
by @Awcrr in #207 - Argument descriptions by @a-r-r-o-w in #208
- Improve argument handling by @a-r-r-o-w in #209
- Helpful messages by @a-r-r-o-w in #210
- Full Finetuning for LTX pos...