Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError when training from CogVideoX-5b-I2V #47

Open
Hickey8 opened this issue Feb 15, 2025 · 18 comments
Open

AttributeError when training from CogVideoX-5b-I2V #47

Hickey8 opened this issue Feb 15, 2025 · 18 comments

Comments

@Hickey8
Copy link

Hickey8 commented Feb 15, 2025

Hi! I set export MODEL_PATH="/data/Pretrained_models/CogVideoX-5b-I2V"
export CONFIG_PATH="/data/Pretrained_models/CogVideoX-5b-I2V" in and train from scratch, I met:
[rank1]: Traceback (most recent call last):
[rank1]: File "/data/Code/ConsisID/train.py", line 1359, in
[rank1]: main(args)
[rank1]: File "/data/Code/ConsisID/train.py", line 485, in main
[rank1]: transformer.enable_gradient_checkpointing()
[rank1]: File "/data/Code/ConsisID/diffusers/src/diffusers/models/modeling_utils.py", line 189, in enable_gradient_checkpointing
[rank1]: self.apply(partial(self._set_gradient_checkpointing, value=True))
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/Code/ConsisID/diffusers/src/diffusers/models/modeling_utils.py", line 173, in getattr
[rank1]: return super().getattr(name)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/miniconda/envs/consis/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1931, in getattr
[rank1]: raise AttributeError(
[rank1]: AttributeError: 'ConsisIDTransformer3DModel' object has no attribute '_set_gradient_checkpointing'. Did you mean: 'is_gradient_checkpointing'?

@Hickey8
Copy link
Author

Hickey8 commented Feb 15, 2025

I found the bug was caused by the latest revision yesterday which deleted function '_set_gradient_checkpointing' in transformer_consisid.py.
Image
When I rollback version and readd the function '_set_gradient_checkpointing' , the bug can be fixed. Another solution based on the latest revision is delete gradient_checkpointing args in train_single_rank.sh, but it will lead to Cuda Out of Memory in 80G H100.

@SHYuanBest
Copy link
Member

SHYuanBest commented Feb 16, 2025

Much thanks for your feedback! Anthor solution is to install the latest diffusers by pip install git+https://github.com/huggingface/diffusers.git, since diffusers have removed _set_gradient_checkpointing.

@Hickey8
Copy link
Author

Hickey8 commented Feb 16, 2025

I think I have already installed the latest version of diffusers==0.33.0.dev0(If I’m not mistaken), but I still got the error. And I found _set_gradient_checkpointing existing in models/modeling_utils.py in diffusers.
By the way, I wanna know has ConsisID supported CogVideoX-1.5-5B-I2V so far? I noticed you mentioned that 'fixed the from_pretrained_cus to load CogVideoX1.5' but 'CogVideoX1.5 is not supported yet by ConsisID in forward()'.

@SHYuanBest
Copy link
Member

Oh, I see, try pip uninstall diffusers and reinstall.

@SHYuanBest
Copy link
Member

I have supported CogVideoX-1.5-5B-I2V in local env, and will updaet the code in this repo soon.

@Hickey8
Copy link
Author

Hickey8 commented Feb 17, 2025

Oh, I see, try pip uninstall diffusers and reinstall.

If I uninstall diffusers and reinstall directly, I can only get diffusers==0.32.2 and encounter ImportError: cannot import name 'ConsisIDTransformer3DModel' from 'diffusers.models'.

@Hickey8
Copy link
Author

Hickey8 commented Feb 17, 2025

I have supported CogVideoX-1.5-5B-I2V in local env, and will updaet the code in this repo soon.

Got it, looking forward to your update!

@SHYuanBest
Copy link
Member

@Hickey8 have supported CogVideoX1.5, welcome to try!

@Hickey8
Copy link
Author

Hickey8 commented Feb 17, 2025

@Hickey8 have supported CogVideoX1.5, welcome to try!

Hi! I fixed the bug 'ImportError: cannot import name 'ConsisIDTransformer3DModel' from 'diffusers.models'' mentioned above and
tried the latest code, training from scratch with CogVideoX1.5 and BestWishYsh/ConsisID-preview-data you published. I encountered the error
[rank1]: Traceback (most recent call last):
[rank1]:   File "/data/Code/ConsisID/train.py", line 1395, in
[rank1]:     main(args)
[rank1]:   File "/data/Code/ConsisID/train.py", line 1205, in main
[rank1]:     loss = (loss * dense_masks).sum() / dense_masks.sum()
[rank1]:             ~~~~~^~~~~~~~~~~~~
[rank1]: RuntimeError: The size of tensor a (1209600) must match the size of tensor b (1123200) at non-singleton dimension 1

@SHYuanBest
Copy link
Member

@Hickey8 have fixed, thanks for feedback!

@Hickey8
Copy link
Author

Hickey8 commented Feb 17, 2025

@Hickey8 have fixed, thanks for feedback!

I have checked and glad to see the training with CogVideoX 1.5 starts smoothly. But I have another question, when I train from scratch with version 1.0, the first frame of the generated validation results would weirdly show a face image like this.

Image

This issue persisted until step 1500, while the loss became nan. What could be the reason for this? And why does the loss become nan although I have used a smaller learning rate when training from scratch? Interestingly, version 1.5 also shows a similar face image issue, but as training steps increase, it gradually disappears. I haven’t checked yet whether the NaN loss issue also appears as the training progresses.

@Hickey8
Copy link
Author

Hickey8 commented Feb 18, 2025

CogVideoX1.5 also encountered nan loss around 1554 steps and got a bug in 3022 steps:
Steps: 44%|████▎ | 3022/6920 [4:12:18<4:45:17, 4.39s/it, loss=nan, lr=2.92e-7]fail to detect face using insightface, extract embedding on align face
All reserve images failed, attempting to process frames from video. Error: list index out of range
Frame 0 processing failed, trying next frame. Error: list index out of range
Frame 5 processing failed, trying next frame. Error: list index out of range
Frame 10 processing failed, trying next frame. Error: list index out of range
Frame 15 processing failed, trying next frame. Error: list index out of range
Frame 20 processing failed, trying next frame. Error: list index out of range
Frame 25 processing failed, trying next frame. Error: facexlib align face fail
Frame 30 processing failed, trying next frame. Error: list index out of range
Frame 35 processing failed, trying next frame. Error: list index out of range
Frame 40 processing failed, trying next frame. Error: list index out of range
Frame 45 processing failed, trying next frame. Error: list index out of range
All attempts failed for image 0. No valid embeddings could be generated.
[rank3]: Traceback (most recent call last):
[rank3]: File "/data/Code/ConsisID/train.py", line 1393, in
[rank3]: main(args)
[rank3]: File "/data/Code/ConsisID/train.py", line 1191, in main
[rank3]: model_pred = scheduler.get_velocity(model_output, noisy_video_latents, timesteps)
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: File "/data/Code/ConsisID/diffusers/src/diffusers/schedulers/scheduling_dpm_cogvideox.py", line 485, in get_velocity
[rank3]: velocity = sqrt_alpha_prod * noise - sqrt_one_minus_alpha_prod * sample
[rank3]: ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[rank3]: RuntimeError: The size of tensor a (32) must match the size of tensor b (16) at non-singleton dimension 2

@SHYuanBest
Copy link
Member

Thanks for your effort!
For the first question, this result is similar to Figure 7 in the paper, but the problem is different. I suspect it is a problem with the dense mask. I have updated the code.
For the second question, you can refer to #17 or #31.
For the last question, have fixed in the latest code.

@Hickey8
Copy link
Author

Hickey8 commented Feb 18, 2025

I have increased the batch size from 1 to 2 and successfully completed one epoch of training with version 1.5. Everything seems to be going well so far although the ID preservation is not good since I only trained for around 3000 steps. I will continue training for the default 15 epochs to see if any issues arise. Thank you for your help!
By the way, I am not sure about what it exactly means that 'Another method is to add a regularization term to the output of the middle layer.' in #17. In my experience, normalization is typically applied before the activation function. And it seems to have enough layernorm in the framework. Could you please explain more clearly?

@SHYuanBest
Copy link
Member

That's great to see going well.
For the question of id preservation quality, it may due the consisid-preview-data is small and not very hight quality. And we plan to release more data in the future.
For the question of regulation, the loss nan is due to the numerical instability of MMDIT architecture, so it may solve the problem by adding regularization term.

@Hickey8
Copy link
Author

Hickey8 commented Feb 19, 2025

You mean adding weight decay, namely adding regularization term to the loss function ?

@SHYuanBest
Copy link
Member

SHYuanBest commented Feb 19, 2025

You mean adding weight decay, namely adding regularization term to the loss function ?

Or limit the output of each 3D Attn Layer (MMDIT), since it may be very large (compared to cross attn). I think the large output may be the reason for the loss NaN.

@Hickey8
Copy link
Author

Hickey8 commented Feb 19, 2025

Got it, Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants