-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError when training from CogVideoX-5b-I2V #47
Comments
Much thanks for your feedback! Anthor solution is to install the latest |
I think I have already installed the latest version of diffusers==0.33.0.dev0(If I’m not mistaken), but I still got the error. And I found _set_gradient_checkpointing existing in models/modeling_utils.py in diffusers. |
Oh, I see, try |
I have supported CogVideoX-1.5-5B-I2V in local env, and will updaet the code in this repo soon. |
If I uninstall diffusers and reinstall directly, I can only get diffusers==0.32.2 and encounter ImportError: cannot import name 'ConsisIDTransformer3DModel' from 'diffusers.models'. |
Got it, looking forward to your update! |
@Hickey8 have supported CogVideoX1.5, welcome to try! |
Hi! I fixed the bug 'ImportError: cannot import name 'ConsisIDTransformer3DModel' from 'diffusers.models'' mentioned above and |
@Hickey8 have fixed, thanks for feedback! |
I have checked and glad to see the training with CogVideoX 1.5 starts smoothly. But I have another question, when I train from scratch with version 1.0, the first frame of the generated validation results would weirdly show a face image like this. This issue persisted until step 1500, while the loss became nan. What could be the reason for this? And why does the loss become nan although I have used a smaller learning rate when training from scratch? Interestingly, version 1.5 also shows a similar face image issue, but as training steps increase, it gradually disappears. I haven’t checked yet whether the NaN loss issue also appears as the training progresses. |
CogVideoX1.5 also encountered nan loss around 1554 steps and got a bug in 3022 steps: |
Thanks for your effort! |
I have increased the batch size from 1 to 2 and successfully completed one epoch of training with version 1.5. Everything seems to be going well so far although the ID preservation is not good since I only trained for around 3000 steps. I will continue training for the default 15 epochs to see if any issues arise. Thank you for your help! |
That's great to see going well. |
You mean adding weight decay, namely adding regularization term to the loss function ? |
Or limit the output of each 3D Attn Layer (MMDIT), since it may be very large (compared to cross attn). I think the large output may be the reason for the loss NaN. |
Got it, Thanks a lot! |
Hi! I set export MODEL_PATH="/data/Pretrained_models/CogVideoX-5b-I2V"
export CONFIG_PATH="/data/Pretrained_models/CogVideoX-5b-I2V" in and train from scratch, I met:
[rank1]: Traceback (most recent call last):
[rank1]: File "/data/Code/ConsisID/train.py", line 1359, in
[rank1]: main(args)
[rank1]: File "/data/Code/ConsisID/train.py", line 485, in main
[rank1]: transformer.enable_gradient_checkpointing()
[rank1]: File "/data/Code/ConsisID/diffusers/src/diffusers/models/modeling_utils.py", line 189, in enable_gradient_checkpointing
[rank1]: self.apply(partial(self._set_gradient_checkpointing, value=True))
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/Code/ConsisID/diffusers/src/diffusers/models/modeling_utils.py", line 173, in getattr
[rank1]: return super().getattr(name)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/miniconda/envs/consis/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1931, in getattr
[rank1]: raise AttributeError(
[rank1]: AttributeError: 'ConsisIDTransformer3DModel' object has no attribute '_set_gradient_checkpointing'. Did you mean: 'is_gradient_checkpointing'?
The text was updated successfully, but these errors were encountered: