Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LTX-Video Image2Video LORA? #240

Closed
BlackTea-c opened this issue Jan 23, 2025 · 9 comments
Closed

LTX-Video Image2Video LORA? #240

BlackTea-c opened this issue Jan 23, 2025 · 9 comments

Comments

@BlackTea-c
Copy link

Feature request / 功能建议

LTX-Video Image2Video LORA?

Motivation / 动机

LTX-Video Image2Video LORA?

Your contribution / 您的贡献

LTX-Video Image2Video LORA?

@a-r-r-o-w
Copy link
Owner

Coming this weekend after some more testing!

@BlackTea-c
Copy link
Author

great!

@pushchar
Copy link

pushchar commented Feb 9, 2025

Hi, firstly thanks for this amazing work!
Are there any updates on the timeline for LTX image2video finetuning scripts release?

@a-r-r-o-w
Copy link
Owner

@pushchar There is not really a difference in the training algorithm for LTX img2vid. It is supported in #245 and I've gotten some decent results so far, but the PR is taking longer than expected due to requiring extensive amount of testing

first_frame_conditioning_p = 0.1
min_first_frame_sigma = 0.25
latents = latent_model_conditions.pop("latents")
latents_mean = latent_model_conditions.pop("latents_mean")
latents_std = latent_model_conditions.pop("latents_std")
latents = self._normalize_latents(latents, latents_mean, latents_std)
noise = torch.zeros_like(latents).normal_(generator=generator)
if random.random() < first_frame_conditioning_p:
# Based on Section 2.4 of the paper, it mentions that the first frame timesteps should be a small random value.
# Making as estimated guess, we limit the sigmas to be at least 0.2.
# torch.rand_like returns values in [0, 1). We want to make sure that the first frame sigma is <= actual sigmas
# for image conditioning. In order to do this, we rescale by multiplying with sigmas so the range is [0, sigmas).
first_frame_sigma = torch.rand_like(sigmas) * sigmas
first_frame_sigma = torch.min(first_frame_sigma, sigmas.new_full(sigmas.shape, min_first_frame_sigma))
latents_first_frame, latents_rest = latents[:, :, :1], latents[:, :, 1:]
noisy_latents_first_frame = FF.flow_match_xt(latents_first_frame, noise[:, :, :1], first_frame_sigma)
noisy_latents_remaining = FF.flow_match_xt(latents_rest, noise[:, :, 1:], sigmas)
noisy_latents = torch.cat([noisy_latents_first_frame, noisy_latents_remaining], dim=2)
else:
noisy_latents = FF.flow_match_xt(latents, noise, sigmas)

@pushchar
Copy link

I see, thanks!

@eisneim
Copy link

eisneim commented Feb 12, 2025

@BlackTea-c @pushchar I've successfully trained LTX image to video lora here: eisneim/ltx_lora_training_i2v_t2v

@a-r-r-o-w
Copy link
Owner

@eisneim Wow, this looks amazing! Superb work :) I would love if you would be interested in collaborating on testing and adding new features/algorithms. For example, I too am planning to add a general-purpose VideoJAM trainer as soon as I find some time after parallel training in #245. Until then, if you do make trainer_videojam.py similar to how we have a trainer.py, it would be super cool and can easily be made available for use in many models at once

@eisneim
Copy link

eisneim commented Feb 12, 2025

@a-r-r-o-w Cool, if i can pull it off doing what VideoJAM is doing i would definitely contribute back to finetrainers;
But now i'm still generating tons of optical flows with just two RTX 4090, my initial idea is to use Loras instead of add new layers to Dit and use the intermidiate latents to guide the final output

@a-r-r-o-w
Copy link
Owner

Support has now been added with reproducible example script and checkpoint!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants