Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support model cards #176

Merged
merged 13 commits into from
Jan 5, 2025
Merged

support model cards #176

merged 13 commits into from
Jan 5, 2025

Conversation

sayakpaul
Copy link
Collaborator

  • Adds support for model cards.
  • Allows to pass fps.
  • Factors out artifacts generation and logging related utilities from Trainer to keep it leaner.

@sayakpaul sayakpaul requested a review from a-r-r-o-w January 3, 2025 11:07
from PIL import Image


def save_model_card(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to keep it minimal yet sufficient for now.

Copy link
Owner

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@sayakpaul sayakpaul changed the title support model cards and passing fps support model cards Jan 3, 2025
@sayakpaul
Copy link
Collaborator Author

Just tested the local changes on a small run: https://huggingface.co/sayakpaul/cog2b_disney (with #178 merged in)

LMK if there's anything else left off to be addressed in this PR.

Copy link
Owner

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, the changes look good. I haven't run any tests to verify but the I'm assuming this comes from changes introduced here so looks good to merge. Okay to merge if you feel this is good to go, thanks!

@sayakpaul
Copy link
Collaborator Author

I haven't run any tests to verify but the I'm assuming this comes from changes introduced here so looks good to merge.

Yes, that is correct.

Would you rather have me run some complete runs for Cog (2b, 5b)? Completely fine by me if so.

@@ -988,6 +991,8 @@ def validate(self, step: int, final_validation: bool = False) -> None:
# TODO: this should be configurable here as well as in validation runs where we call the pipeline that has `fps`.
export_to_video(artifact_value, filename, fps=15)
artifact_value = wandb.Video(filename, caption=prompt)
if accelerator.is_main_process:
prompts_to_filenames[prompt] = filename
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just tested it and it seems like here the filename ends up becoming something like /raid/aryan/final-validation-.... This is because we join the output_dir to the filename when saving the video

image

So, I think we need to make sure we use the filename before it gets prefixed with output_dir

Copy link
Collaborator Author

@sayakpaul sayakpaul Jan 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure about it.

I inspected the output_dir:

ls /raid/.cache/huggingface/sayak/cog2b/cog2b_disney
checkpoint-10                             pytorch_lora_weights.safetensors               validation-5-0-BW_STYLE-A-black-and-whit.mp4
checkpoint-5                              README.md                                      validation-5-1-A-woman-with-long-brown-h.mp4
final-10-0-BW_STYLE-A-black-and-whit.mp4  validation-10-0-BW_STYLE-A-black-and-whit.mp4
final-10-1-A-woman-with-long-brown-h.mp4  validation-10-1-A-woman-with-long-brown-h.mp4

Seems alright no?

Also https://huggingface.co/sayakpaul/cog2b_disney seems alright in the sense that the resultant model card looks as expected. But LMK if I am missing something obvious.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean to say that the model card does not render the demo video because of the incorrect path to the video relative to the uploaded repository (the case with your repo as well).

In the created model card, the path shouldn't contain the full path with output_dir (for example, right now it creates something like /raid/aryan/cogvideox-lora/validation-video.mp4). But for the model card, it should just be validation-video.mp4)

Copy link
Collaborator Author

@sayakpaul sayakpaul Jan 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-r-r-o-w
Copy link
Owner

a-r-r-o-w commented Jan 5, 2025

The training seems to work phenomenally, so all changes related to Cog seem good! https://wandb.ai/aryanvs/finetrainers-cogvideox/runs/6925e91q. It seems like I used too high a learning rate because the validation videos collapses a bit in some runs, but definitely seems like the model is learning and there is no quality degradation.

Would you rather have me run some complete runs for Cog (2b, 5b)? Completely fine by me if so.

Completely up to you if you'd like to release something for a specific style! I did a 1000 step run for Cakify so maybe you could check with Disney dataset if you'd like. T2V lora for effects like cakify is almost always a miss, so will try to do something better with I2V in trainer when we add it. My experiments are more aimed towards figuring out if SNR loss weighting has been implemented correctly in the original Cog codebase as well as our scripts. Will probably be easier to figure out with full finetuning so will put on hold and prioritize memory optimizations

@sayakpaul
Copy link
Collaborator Author

@a-r-r-o-w thanks for the positive note and testing that!

I think I will merge this PR given you have verified it. But before that I will wait for your response to #176 (comment).

@sayakpaul sayakpaul requested a review from a-r-r-o-w January 5, 2025 13:49
@sayakpaul
Copy link
Collaborator Author

Sorry for requesting the review again but just want to be sure. Have tested in 800a47f.

@a-r-r-o-w a-r-r-o-w merged commit b3d0ba8 into main Jan 5, 2025
1 check passed
@a-r-r-o-w a-r-r-o-w deleted the model-card branch January 5, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants