AttributeError when training from CogVideoX-5b-I2V #47

Hickey8 · 2025-02-15T07:03:39Z

Hi! I set export MODEL_PATH="/data/Pretrained_models/CogVideoX-5b-I2V"
export CONFIG_PATH="/data/Pretrained_models/CogVideoX-5b-I2V" in and train from scratch, I met:
[rank1]: Traceback (most recent call last):
[rank1]: File "/data/Code/ConsisID/train.py", line 1359, in
[rank1]: main(args)
[rank1]: File "/data/Code/ConsisID/train.py", line 485, in main
[rank1]: transformer.enable_gradient_checkpointing()
[rank1]: File "/data/Code/ConsisID/diffusers/src/diffusers/models/modeling_utils.py", line 189, in enable_gradient_checkpointing
[rank1]: self.apply(partial(self._set_gradient_checkpointing, value=True))
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/Code/ConsisID/diffusers/src/diffusers/models/modeling_utils.py", line 173, in getattr
[rank1]: return super().getattr(name)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/data/miniconda/envs/consis/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1931, in getattr
[rank1]: raise AttributeError(
[rank1]: AttributeError: 'ConsisIDTransformer3DModel' object has no attribute '_set_gradient_checkpointing'. Did you mean: 'is_gradient_checkpointing'?

Hickey8 · 2025-02-15T07:42:09Z

I found the bug was caused by the latest revision yesterday which deleted function '_set_gradient_checkpointing' in transformer_consisid.py.

When I rollback version and readd the function '_set_gradient_checkpointing' , the bug can be fixed. Another solution based on the latest revision is delete gradient_checkpointing args in train_single_rank.sh, but it will lead to Cuda Out of Memory in 80G H100.

SHYuanBest · 2025-02-16T04:03:38Z

Much thanks for your feedback! Anthor solution is to install the latest diffusers by pip install git+https://github.com/huggingface/diffusers.git, since diffusers have removed _set_gradient_checkpointing.

Hickey8 · 2025-02-16T04:22:59Z

I think I have already installed the latest version of diffusers==0.33.0.dev0(If I’m not mistaken), but I still got the error. And I found _set_gradient_checkpointing existing in models/modeling_utils.py in diffusers.
By the way, I wanna know has ConsisID supported CogVideoX-1.5-5B-I2V so far? I noticed you mentioned that 'fixed the from_pretrained_cus to load CogVideoX1.5' but 'CogVideoX1.5 is not supported yet by ConsisID in forward()'.

SHYuanBest · 2025-02-16T05:14:47Z

Oh, I see, try pip uninstall diffusers and reinstall.

SHYuanBest · 2025-02-16T05:17:04Z

I have supported CogVideoX-1.5-5B-I2V in local env, and will updaet the code in this repo soon.

Hickey8 · 2025-02-17T05:03:14Z

Oh, I see, try pip uninstall diffusers and reinstall.

If I uninstall diffusers and reinstall directly, I can only get diffusers==0.32.2 and encounter ImportError: cannot import name 'ConsisIDTransformer3DModel' from 'diffusers.models'.

Hickey8 · 2025-02-17T05:06:23Z

I have supported CogVideoX-1.5-5B-I2V in local env, and will updaet the code in this repo soon.

Got it, looking forward to your update!

SHYuanBest · 2025-02-17T10:06:28Z

@Hickey8 have supported CogVideoX1.5, welcome to try!

Hickey8 · 2025-02-17T13:06:59Z

@Hickey8 have supported CogVideoX1.5, welcome to try!

Hi! I fixed the bug 'ImportError: cannot import name 'ConsisIDTransformer3DModel' from 'diffusers.models'' mentioned above and
tried the latest code, training from scratch with CogVideoX1.5 and BestWishYsh/ConsisID-preview-data you published. I encountered the error
[rank1]: Traceback (most recent call last):
[rank1]: File "/data/Code/ConsisID/train.py", line 1395, in
[rank1]: main(args)
[rank1]: File "/data/Code/ConsisID/train.py", line 1205, in main
[rank1]: loss = (loss * dense_masks).sum() / dense_masks.sum()
[rank1]: ~~~~~^~~~~~~~~~~~~
[rank1]: RuntimeError: The size of tensor a (1209600) must match the size of tensor b (1123200) at non-singleton dimension 1

SHYuanBest · 2025-02-17T13:35:43Z

@Hickey8 have fixed, thanks for feedback!

Hickey8 · 2025-02-17T15:35:57Z

@Hickey8 have fixed, thanks for feedback!

I have checked and glad to see the training with CogVideoX 1.5 starts smoothly. But I have another question, when I train from scratch with version 1.0, the first frame of the generated validation results would weirdly show a face image like this.

This issue persisted until step 1500, while the loss became nan. What could be the reason for this? And why does the loss become nan although I have used a smaller learning rate when training from scratch? Interestingly, version 1.5 also shows a similar face image issue, but as training steps increase, it gradually disappears. I haven’t checked yet whether the NaN loss issue also appears as the training progresses.

Hickey8 · 2025-02-18T03:07:33Z

CogVideoX1.5 also encountered nan loss around 1554 steps and got a bug in 3022 steps:
Steps: 44%|████▎ | 3022/6920 [4:12:18<4:45:17, 4.39s/it, loss=nan, lr=2.92e-7]fail to detect face using insightface, extract embedding on align face
All reserve images failed, attempting to process frames from video. Error: list index out of range
Frame 0 processing failed, trying next frame. Error: list index out of range
Frame 5 processing failed, trying next frame. Error: list index out of range
Frame 10 processing failed, trying next frame. Error: list index out of range
Frame 15 processing failed, trying next frame. Error: list index out of range
Frame 20 processing failed, trying next frame. Error: list index out of range
Frame 25 processing failed, trying next frame. Error: facexlib align face fail
Frame 30 processing failed, trying next frame. Error: list index out of range
Frame 35 processing failed, trying next frame. Error: list index out of range
Frame 40 processing failed, trying next frame. Error: list index out of range
Frame 45 processing failed, trying next frame. Error: list index out of range
All attempts failed for image 0. No valid embeddings could be generated.
[rank3]: Traceback (most recent call last):
[rank3]: File "/data/Code/ConsisID/train.py", line 1393, in
[rank3]: main(args)
[rank3]: File "/data/Code/ConsisID/train.py", line 1191, in main
[rank3]: model_pred = scheduler.get_velocity(model_output, noisy_video_latents, timesteps)
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: File "/data/Code/ConsisID/diffusers/src/diffusers/schedulers/scheduling_dpm_cogvideox.py", line 485, in get_velocity
[rank3]: velocity = sqrt_alpha_prod * noise - sqrt_one_minus_alpha_prod * sample
[rank3]: ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[rank3]: RuntimeError: The size of tensor a (32) must match the size of tensor b (16) at non-singleton dimension 2

SHYuanBest · 2025-02-18T03:31:11Z

Thanks for your effort!
For the first question, this result is similar to Figure 7 in the paper, but the problem is different. I suspect it is a problem with the dense mask. I have updated the code.
For the second question, you can refer to #17 or #31.
For the last question, have fixed in the latest code.

Hickey8 · 2025-02-18T12:34:57Z

I have increased the batch size from 1 to 2 and successfully completed one epoch of training with version 1.5. Everything seems to be going well so far although the ID preservation is not good since I only trained for around 3000 steps. I will continue training for the default 15 epochs to see if any issues arise. Thank you for your help!
By the way, I am not sure about what it exactly means that 'Another method is to add a regularization term to the output of the middle layer.' in #17. In my experience, normalization is typically applied before the activation function. And it seems to have enough layernorm in the framework. Could you please explain more clearly?

SHYuanBest · 2025-02-19T09:19:06Z

That's great to see going well.
For the question of id preservation quality, it may due the consisid-preview-data is small and not very hight quality. And we plan to release more data in the future.
For the question of regulation, the loss nan is due to the numerical instability of MMDIT architecture, so it may solve the problem by adding regularization term.

Hickey8 · 2025-02-19T09:31:11Z

You mean adding weight decay, namely adding regularization term to the loss function ?

SHYuanBest · 2025-02-19T13:01:55Z

You mean adding weight decay, namely adding regularization term to the loss function ?

Or limit the output of each 3D Attn Layer (MMDIT), since it may be very large (compared to cross attn). I think the large output may be the reason for the loss NaN.

Hickey8 · 2025-02-19T13:04:46Z

Got it, Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError when training from CogVideoX-5b-I2V #47

AttributeError when training from CogVideoX-5b-I2V #47

Hickey8 commented Feb 15, 2025

Hickey8 commented Feb 15, 2025

SHYuanBest commented Feb 16, 2025 •

edited

Loading

Hickey8 commented Feb 16, 2025

SHYuanBest commented Feb 16, 2025

SHYuanBest commented Feb 16, 2025

Hickey8 commented Feb 17, 2025

Hickey8 commented Feb 17, 2025

SHYuanBest commented Feb 17, 2025

Hickey8 commented Feb 17, 2025

SHYuanBest commented Feb 17, 2025

Hickey8 commented Feb 17, 2025

Hickey8 commented Feb 18, 2025

SHYuanBest commented Feb 18, 2025

Hickey8 commented Feb 18, 2025

SHYuanBest commented Feb 19, 2025

Hickey8 commented Feb 19, 2025

SHYuanBest commented Feb 19, 2025 •

edited

Loading

Hickey8 commented Feb 19, 2025

AttributeError when training from CogVideoX-5b-I2V #47

AttributeError when training from CogVideoX-5b-I2V #47

Comments

Hickey8 commented Feb 15, 2025

Hickey8 commented Feb 15, 2025

SHYuanBest commented Feb 16, 2025 • edited Loading

Hickey8 commented Feb 16, 2025

SHYuanBest commented Feb 16, 2025

SHYuanBest commented Feb 16, 2025

Hickey8 commented Feb 17, 2025

Hickey8 commented Feb 17, 2025

SHYuanBest commented Feb 17, 2025

Hickey8 commented Feb 17, 2025

SHYuanBest commented Feb 17, 2025

Hickey8 commented Feb 17, 2025

Hickey8 commented Feb 18, 2025

SHYuanBest commented Feb 18, 2025

Hickey8 commented Feb 18, 2025

SHYuanBest commented Feb 19, 2025

Hickey8 commented Feb 19, 2025

SHYuanBest commented Feb 19, 2025 • edited Loading

Hickey8 commented Feb 19, 2025

SHYuanBest commented Feb 16, 2025 •

edited

Loading

SHYuanBest commented Feb 19, 2025 •

edited

Loading