25 Apr 13:23

358b616

v0.2.0 Latest

Latest

Finetrainers v0.2.0 🧪

New trainers

Channel concatenated control conditioning for Wan2.1 and CogView4

Wan image-conditioning on T2V model
wan-t2v-image-conditioning.mp4
CogView4 control conditioning (Edit + Canny)

The training involves adding extra input channels to the patch embedding layer (referred to as the "control injection" layer in finetrainers), to mix conditioning features into the latent stream. This architecture choice is very common and has been seen before in many models - CogVideoX-I2V, HunyuanVideo-I2V, Alibaba's Fun Control models, etc. Due to the popularity and simplicity in the architecture choice, it is a good choice to support standalone as a trainer.

import torch
from diffusers import CogView4Pipeline
from diffusers.utils import load_image
from finetrainers.models.utils import _expand_linear_with_zeroed_weights
from finetrainers.patches import load_lora_weights
from finetrainers.patches.dependencies.diffusers.control import control_channel_concat

dtype = torch.bfloat16
device = torch.device("cuda")
generator = torch.Generator().manual_seed(0)

pipe = CogView4Pipeline.from_pretrained("THUDM/CogView4-6B", torch_dtype=dtype)

in_channels = pipe.transformer.config.in_channels
patch_channels = pipe.transformer.patch_embed.proj.in_features
pipe.transformer.patch_embed.proj = _expand_linear_with_zeroed_weights(pipe.transformer.patch_embed.proj, new_in_features=2 * patch_channels)

load_lora_weights(pipe, "finetrainers/CogView4-6B-Edit-LoRA-v0", "cogview4-lora")
pipe.set_adapters("cogview4-lora", 0.9)
pipe.to(device)

prompt = "Make the image look like it's from an ancient Egyptian mural."
control_image = load_image("examples/training/control/cogview4/omni_edit/validation_dataset/0.png")
height, width = 1024, 1024

with torch.no_grad():
    latents = pipe.prepare_latents(1, in_channels, height, width, dtype, device, generator)
    control_image = pipe.image_processor.preprocess(control_image, height=height, width=width)
    control_image = control_image.to(device=device, dtype=dtype)
    control_latents = pipe.vae.encode(control_image).latent_dist.sample(generator=generator)
    control_latents = (control_latents - pipe.vae.config.shift_factor) * pipe.vae.config.scaling_factor

with control_channel_concat(pipe.transformer, ["hidden_states"], [control_latents], dims=[1]):
    image = pipe(prompt, latents=latents, num_inference_steps=30, generator=generator).images[0]

image.save("output.png")

New models supported

FLUX.1-dev
Wan2.1 I2V

Find example training configs here.

Attention

Support for multiple different attention providers for training and inference - Pytorch native, flash-attn, sageattention, xformers, flex. See docs for more details.

Other major changes

Better regional compilation support

What's Changed

Update project showcase by @a-r-r-o-w in #355
Flux ModelSpec by @a-r-r-o-w in #358
Pytorch regional compilation by @a-r-r-o-w in #361
[Doc] Fix a typo of flux.md by @DarkSharpness in #363
[Fix] Raise ValueError proactively before some confusing errors occur due to wrong input image size by @DarkSharpness in #364
Improve webdataset caption loading by @a-r-r-o-w in #365
fix string matching for blocks by @neph1 in #360
Bump ruff version by @a-r-r-o-w in #367
Channel-concatenated Control Trainer by @a-r-r-o-w in #310
Fix #352: FSDP2 argument typo by @a-r-r-o-w in #370
Support Wan I2V; Better regional compile support by @a-r-r-o-w in #375
chore: save all weights with step-specific directories by @Leojc in #379
fix: lora loading for final validation by @Leojc in #382
Fix posterior computation and control tests by @a-r-r-o-w in #384
Support flash/flex/xformers/sage attention by @a-r-r-o-w in #377

New Contributors

@DarkSharpness made their first contribution in #363

Full Changelog: v0.1.0...v0.2.0

Contributors

neph1, Leojc, and 2 other contributors

Assets 2

0 Join discussion

21 Mar 11:26

a-r-r-o-w

v0.1.0

5ea0457

v0.1.0

Finetrainers v0.1.0 🧪

output.mp4

New models supported

CogView4
Wan 2.1

Checkpoints released

Other major changes

Fully support for Accelerate as a parallelization backend again
Opt-in precomputation. Helpful for smaller datasets
Better remote dataset loading support
Support for datasets>=3.4.0
Bug fix: Layerwise Casting now works with Wan
Bug fix: LTX Video training now works with batch_size > 1

What's Changed

calling save_model_card correctly by @vivkul in #275
3D Parallel + Model Spec API by @a-r-r-o-w in #245
CogVideoX ModelSpec by @a-r-r-o-w in #280
Remove unused CogVideoX code by @a-r-r-o-w in #282
Wan T2V ModelSpec by @a-r-r-o-w in #281
update docs by @a-r-r-o-w in #284
Improve local dataset loading by @a-r-r-o-w in #289
Add 3DGS dataset example for Wan by @a-r-r-o-w in #290
HunyuanVideo ModelSpec by @a-r-r-o-w in #287
Fix full rank config by @a-r-r-o-w in #292
update docs by @a-r-r-o-w in #294
CogView4 ModelSpec by @a-r-r-o-w in #297
PyPI package by @a-r-r-o-w in #298
Minor cleanup and doc updates by @a-r-r-o-w in #304
Remove forceful precomputation behaviour; Improvements to data loading by @a-r-r-o-w in #303
Webdataset improvements; CogView4 example with The Simpsons webdataset by @a-r-r-o-w in #305
Fix for #306 by @a-r-r-o-w in #307
Fix Wan scaling due to upstream changes by @a-r-r-o-w in #308
Fix CogVideoX config error for invert_scale_latents by @a-r-r-o-w in #309
add valid names to dataset docs by @neph1 in #318
Fix Layerwise Casting by @a-r-r-o-w in #316
Fix enable_model_cpu_offload problems by @a-r-r-o-w in #320
Patch WanTimeTextImageEmbedding forward only with fp8 by @a-r-r-o-w in #327
Lower memory requirements on single GPU by @a-r-r-o-w in #321
Respect report_to if set to none by @a-r-r-o-w in #331
Cleanup model load methods by @a-r-r-o-w in #333
add init.py to sft_trainer/ to fix #335 by @jbilcke-hf in #336
Add version guard for datasets 3.4.0; Fix argument documentation by @a-r-r-o-w in #332
Update dataset.py by @a-r-r-o-w in #337
Remove hard-coded batch size configuration in LTXModel by @Linyou in #340
Fix missing comma in loading hunyuan video tokenizer by @a-r-r-o-w in #341
Add back accelerate compatibility by @a-r-r-o-w in #339
Relative -> Absolute imports by @a-r-r-o-w in #343
Prepare for v0.1.0 release by @a-r-r-o-w in #322

New Contributors

@vivkul made their first contribution in #275
@neph1 made their first contribution in #318
@jbilcke-hf made their first contribution in #336
@Linyou made their first contribution in #340

Full Changelog: v0.0.1...v0.1.0

Contributors

vivkul, neph1, and 3 other contributors

Assets 2

24 Feb 10:15

a-r-r-o-w

v0.0.1

41dd338

v0.0.1

FineTrainers v0.0.1 🧪

FineTrainers is a work-in-progress library to support (accessible) training of diffusion models. The following models are currently supported (based on Diffusers):

CogVideoX T2V (versions 1.0 and 1.5)
LTX Video
Hunyuan Video

The legacy/deprecated scripts also support CogVideoX I2V and Mochi.

Currently, LoRA and Full-rank finetuning is supported. With time, more models and training techniques will be supported. We thank our many contributors for their amazing work at improve finetrainers. They are mentioned below in the "New Contributors" section.

In a short timespan, finetrainers has found its way into multiple research works, which has been a very motivating factor for us. They are mentioned in the "Featured Projects" section of the README. We hope you find them interesting and continue to build & work on interesting ideas, while sharing your research artifacts openly!

Some artifacts that we've released ourselves is available here: https://huggingface.co/finetrainers

We plan to focus on core algorithms/models that users prefer to have support for quickly, primarily based on the feedback we've received (thanks to everyone who's spoken with me regarding this. Your time is invaluable!) The majors asks are:

more models and faster support for newer models (we will open this up for contributions after a major open but pending PR, and add many ourselves!)
compatibility with UIs that do not support standardized implementations from diffusers (we will write two-way conversion scripts for new models that are added to diffusers, so that it is easy to obtain original-format weights from diffusers-format weights)
more algorithms (Control LoRA, ControlNets for video models and VideoJAM are some of the highly asked techniques -- we will prioritize this!)
Dataset QoL changes (this is a WIP in an open but pending PR)

Let us know what you'd like to see next & stay tuned for interesting updates!

output.mp4

What's Changed

CogVideoX LoRA and full finetuning by @a-r-r-o-w in #1
Low-bit memory optimizers, CpuOffloadOptimizer, Memory Reports by @a-r-r-o-w in #3
Pin memory support in dataloader by @a-r-r-o-w in #5
DeepSpeed fixes by @a-r-r-o-w in #7
refactor readme i. by @sayakpaul in #8
DeepSpeed and DDP Configs by @a-r-r-o-w in #10
Full finetuning memory requirements by @a-r-r-o-w in #9
Multi-GPU parallel encoding support for training videos. by @zRzRzRzRzRzRzR in #6
CogVideoX I2V; CPU offloading; Model README descriptions by @a-r-r-o-w in #11
add VideoDatasetWithResizeAndRectangleCrop dataset resize crop by @glide-the in #13
add "max_sequence_length": model_config.max_text_seq_length, by @glide-the in #15
readme updates + refactor by @a-r-r-o-w in #14
Update README.md by @a-r-r-o-w in #17
merge by @zRzRzRzRzRzRzR in #18
Darft of Chinese README by @zRzRzRzRzRzRzR in #19
docs: update README.md by @eltociear in #21
Update requirements.txt (fixed typo) by @Nojahhh in #24
Update README and Contribution guide by @zRzRzRzRzRzRzR in #20
Lower requirements versions by @a-r-r-o-w in #27
Update for windows compability by @Nojahhh in #32
[Docs] : Update README.md by @FarukhS52 in #35
Improve dataset preparation support + multiresolution prep by @a-r-r-o-w in #39
Update prepare_dataset.sh by @a-r-r-o-w in #42
improve dataset preparation by @a-r-r-o-w in #43
more dataset fixes by @a-r-r-o-w in #49
fix: correct type in .py files by @DhanushNehru in #52
fix: resuming from a checkpoint when using deepspeed. by @sayakpaul in #38
Windows support for T2V scripts by @a-r-r-o-w in #48
Fixed optimizers parsing error in bash scripts by @Nojahhh in #61
Update readme to install diffusers from source by @Yuancheng-Xu in #59
Update README.md by @a-r-r-o-w in #73
add some script of lora test by @zRzRzRzRzRzRzR in #66
I2V multiresolution finetuning by removing learned PEs by @a-r-r-o-w in #31
adaption for CogVideoX1.5 by @jiashenggu in #92
docs: fix help message in args.py by @Leojc in #98
sft with multigpu by @zhipuch in #84
[feat] add Mochi-1 trainer by @sayakpaul in #90
wandb tracker in scheduling problems during the training initiation and training stages by @glide-the in #100
fix format specifier. by @sayakpaul in #104
Unbound fix by @glide-the in #105
feat: support checkpointing saving and loading by @sayakpaul in #106
RoPE fixes for 1.5, bfloat16 support in prepare_dataset, gradient_accumulation grad norm undefined fix by @a-r-r-o-w in #107
Update README.md to include mochi-1 trainer by @sayakpaul in #112
add I2V sft and fix an error by @jiashenggu in #97
LTX Video by @a-r-r-o-w in #123
Hunyuan Video LoRA by @a-r-r-o-w in #126
Precomputation of conditions and latents by @a-r-r-o-w in #129
Grad Norm tracking in DeepSpeed by @a-r-r-o-w in #148
fix validation bug by @a-r-r-o-w in #149
[feat] support DeepSpeed. by @sayakpaul in #139
[optimization] support 8bit optims from bistandbytes by @sayakpaul in #163
[Chore] bulk update styling and formatting by @sayakpaul in #170
Update README.md to fix graph paths by @sayakpaul in #171
Support CogVideoX T2V by @sayakpaul in #165
Fix scheduler bugs by @sayakpaul in #177
scheduler fixes part ii by @sayakpaul in #178
[CI] add a workflow to do quality checks. by @sayakpaul in #180
support model cards by @sayakpaul in #176
[docs] refactor docs for easier info parsing by @sayakpaul in #175
Allow images; Remove LLM generated prefixes; Allow JSON/JSONL; Fix bugs by @a-r-r-o-w in #158
simplify docs part ii by @sayakpaul in #190
Update requirements by @a-r-r-o-w in #189
Fix minor bug with function call that doesn't exist. by @ArEnSc in #195
Precomputation folder name based on model name by @a-r-r-o-w in #196
Better defaults for LTXV by @a-r-r-o-w in #198
[core] Fix loading of precomputed conditions and latents by @sayakpaul in #199
Epoch loss by @a-r-r-o-w in #201
Shell script to minimally test supported models on a real dataset by @sayakpaul in #204
Update pr_tests.yml to update ruff version by @sayakpaul in #205
Fix the checkpoint dir bug in get_intermediate_ckpt_path by @Awcrr in #207
Argument descriptions by @a-r-r-o-w in #208
Improve argument handling by @a-r-r-o-w in #209
Helpful messages by @a-r-r-o-w in #210
Full Finetuning for LTX pos...