Skip to content

Commit 2056ece

Browse files
committed
update lora training
1 parent 6c9aef3 commit 2056ece

27 files changed

+1776
-98
lines changed

README.md

+37-27
Original file line numberDiff line numberDiff line change
@@ -61,14 +61,12 @@ This repo contains offical PyTorch model definitions, pre-trained weights and in
6161
| <img src="https://github.com/user-attachments/assets/c385a11f-60c7-4919-b0f1-bc5e715f673c" width="80%"> | <video src="https://github.com/user-attachments/assets/0c29ede9-0481-4d40-9c67-a4b6267fdc2d" width="100%"> </video> |
6262
| <img src="https://github.com/user-attachments/assets/5763f5eb-0be5-4b36-866a-5199e31c5802" width="95%"> | <video src="https://github.com/user-attachments/assets/a8da0a1b-ba7d-45a4-a901-5d213ceaf50e" width="100%"> </video> |
6363

64-
<!-- ### Customizable I2V LoRA Demo
64+
### Customizable I2V LoRA Demo
6565

6666
| I2V Lora Effect | Reference Image | Generated Video |
6767
|:---------------:|:--------------------------------:|:----------------:|
6868
| Hair growth | <img src="./assets/demo/i2v_lora/imgs/hair_growth.png" width="40%"> | <video src="https://github.com/user-attachments/assets/06b998ae-bbde-4c1f-96cb-a25a9197d5cb" width="100%"> </video> |
6969
| Embrace | <img src="./assets/demo/i2v_lora/imgs/embrace.png" width="40%"> | <video src="https://github.com/user-attachments/assets/f8c99eb1-2a43-489a-ba02-6bd50a6dd260" width="100%" > </video> |
70-
<!-- | Hair growth | <img src="./assets/demo/i2v_lora/imgs/hair_growth.png" width="40%"> | <video src="https://github.com/user-attachments/assets/06b998ae-bbde-4c1f-96cb-a25a9197d5cb" width="100%" poster="./assets/demo/i2v_lora/imgs/hair_growth.png"> </video> |
71-
| Embrace | <img src="./assets/demo/i2v_lora/imgs/embrace.png" width="40%"> | <video src="https://github.com/user-attachments/assets/f8c99eb1-2a43-489a-ba02-6bd50a6dd260" width="100%" poster="./assets/demo/i2v_lora/imgs/hair_growth.png"> </video> | -->
7270

7371
<!-- ## 🧩 Community Contributions -->
7472

@@ -91,7 +89,7 @@ This repo contains offical PyTorch model definitions, pre-trained weights and in
9189
- [x] Inference
9290
- [x] Checkpoints
9391
- [x] ComfyUI
94-
- [ ] Lora training scripts
92+
- [x] Lora training scripts
9593
- [ ] Multi-gpus Sequence Parallel inference (Faster inference speed on more gpus)
9694
- [ ] Diffusers
9795
- [ ] FP8 Quantified weight
@@ -113,6 +111,12 @@ This repo contains offical PyTorch model definitions, pre-trained weights and in
113111
- [Tips for Using Image-to-Video Models](#tips-for-using-image-to-video-models)
114112
- [Using Command Line](#using-command-line)
115113
- [More Configurations](#more-configurations)
114+
- [🎉 Customizable I2V LoRA effects training](#-customizable-i2v-lora-effects-training)
115+
- [Requirements](#requirements)
116+
- [Environment](#environment)
117+
- [Training data construction](#training-data-construction)
118+
- [Training](#training)
119+
- [Inference](#inference)
116120
- [🔗 BibTeX](#-bibtex)
117121
- [Acknowledgements](#acknowledgements)
118122
---
@@ -230,10 +234,10 @@ If you want to generate a more **stable** video, you can set `--i2v-stability` a
230234
cd HunyuanVideo-I2V
231235

232236
python3 sample_image2video.py \
233-
--prompt "An Asian man with short hair in black tactical uniform and white clothes waves a firework stick." \
234-
--i2v-image-path ./assets/demo/i2v/imgs/0.jpg \
235237
--model HYVideo-T/2 \
238+
--prompt "An Asian man with short hair in black tactical uniform and white clothes waves a firework stick." \
236239
--i2v-mode \
240+
--i2v-image-path ./assets/demo/i2v/imgs/0.jpg \
237241
--i2v-resolution 720p \
238242
--i2v-stability \
239243
--infer-steps 50 \
@@ -250,17 +254,17 @@ If you want to generate a more **high-dynamic** video, you can **unset** `--i2v-
250254
cd HunyuanVideo-I2V
251255

252256
python3 sample_image2video.py \
253-
--prompt "An Asian man with short hair in black tactical uniform and white clothes waves a firework stick." \
254-
--i2v-image-path ./assets/demo/i2v/imgs/0.jpg \
255257
--model HYVideo-T/2 \
258+
--prompt "An Asian man with short hair in black tactical uniform and white clothes waves a firework stick." \
256259
--i2v-mode \
260+
--i2v-image-path ./assets/demo/i2v/imgs/0.jpg \
257261
--i2v-resolution 720p \
258262
--infer-steps 50 \
259263
--video-length 129 \
260264
--flow-reverse \
261265
--flow-shift 17.0 \
262-
--seed 0 \
263266
--embedded-cfg-scale 6.0 \
267+
--seed 0 \
264268
--use-cpu-offload \
265269
--save-path ./results
266270
```
@@ -286,7 +290,7 @@ We list some more useful configurations for easy usage:
286290

287291

288292

289-
<!-- ## 🎉 Customizable I2V LoRA effects training
293+
## 🎉 Customizable I2V LoRA effects training
290294

291295
### Requirements
292296

@@ -313,11 +317,13 @@ Prompt description: The trigger word is written directly in the video caption. I
313317

314318
For example, AI hair growth effect (trigger): rapid_hair_growth, The hair of the characters in the video is growing rapidly. + original prompt
315319

316-
After having the training video and prompt pair, refer to [here](hyvideo/hyvae_extract/README.md) for training data construction.
320+
After having the training video and prompt pair, refer to [here] (hyvideo/hyvae_extract/README.md) for training data construction.
317321

318322

319323
### Training
320324
```
325+
cd HunyuanVideo-I2V
326+
321327
sh scripts/run_train_image2video_lora.sh
322328
```
323329
We list some training specific configurations for easy usage:
@@ -333,30 +339,34 @@ After training, you can find `pytorch_lora_kohaya_weights.safetensors` in `{SAVE
333339

334340
### Inference
335341
```bash
342+
cd HunyuanVideo-I2V
343+
336344
python3 sample_image2video.py \
337-
--model HYVideo-T/2 \
338-
--prompt "Two people hugged tightly, In the video, two people are standing apart from each other. They then move closer to each other and begin to hug tightly. The hug is very affectionate, with the two people holding each other tightly and looking into each other's eyes. The interaction is very emotional and heartwarming, with the two people expressing their love and affection for each other." \
339-
--i2v-mode \
340-
--i2v-image-path ./assets/demo/i2v_lora/imgs/embrace.png \
341-
--i2v-resolution 720p \
342-
--infer-steps 50 \
343-
--video-length 129 \
344-
--flow-reverse \
345-
--flow-shift 5.0 \
346-
--seed 0 \
347-
--use-cpu-offload \
348-
--save-path ./results \
349-
--use-lora \
350-
--lora-scale 1.0 \
351-
--lora-path ./ckpts/hunyuan-video-i2v-720p/lora/embrace_kohaya_weights.safetensors
345+
--model HYVideo-T/2 \
346+
--prompt "Two people hugged tightly, In the video, two people are standing apart from each other. They then move closer to each other and begin to hug tightly. The hug is very affectionate, with the two people holding each other tightly and looking into each other's eyes. The interaction is very emotional and heartwarming, with the two people expressing their love and affection for each other." \
347+
--i2v-mode \
348+
--i2v-image-path ./assets/demo/i2v_lora/imgs/embrace.png \
349+
--i2v-resolution 720p \
350+
--i2v-stability \
351+
--infer-steps 50 \
352+
--video-length 129 \
353+
--flow-reverse \
354+
--flow-shift 5.0 \
355+
--embedded-cfg-scale 6.0 \
356+
--seed 0 \
357+
--use-cpu-offload \
358+
--save-path ./results \
359+
--use-lora \
360+
--lora-scale 1.0 \
361+
--lora-path ./ckpts/hunyuan-video-i2v-720p/lora/embrace_kohaya_weights.safetensors
352362
```
353363
We list some lora specific configurations for easy usage:
354364

355365
| Argument | Default | Description |
356366
|:-------------------:|:-------:|:----------------------------:|
357367
| `--use-lora` | False | Whether to open lora mode. |
358368
| `--lora-scale` | 1.0 | Fusion scale for lora model. |
359-
| `--lora-path` | "" | Weight path for lora model. | -->
369+
| `--lora-path` | "" | Weight path for lora model. |
360370

361371

362372
## 🔗 BibTeX

README_zh.md

+76-42
Original file line numberDiff line numberDiff line change
@@ -52,19 +52,19 @@
5252
| <img src="https://github.com/user-attachments/assets/5763f5eb-0be5-4b36-866a-5199e31c5802" width="95%"> | <video src="https://github.com/user-attachments/assets/a8da0a1b-ba7d-45a4-a901-5d213ceaf50e" width="100%"> </video> |
5353

5454

55-
<!-- ### 定制化I2V LoRA效果演示
55+
### 定制化I2V LoRA效果演示
5656

5757
| 特效类型 | 参考图像 | 生成视频 |
5858
|:---------------:|:--------------------------------:|:----------------:|
5959
| 头发生长 | <img src="./assets/demo/i2v_lora/imgs/hair_growth.png" width="40%"> | <video src="https://github.com/user-attachments/assets/06b998ae-bbde-4c1f-96cb-a25a9197d5cb" width="100%"> </video> |
60-
| 拥抱 | <img src="./assets/demo/i2v_lora/imgs/embrace.png" width="40%"> | <video src="https://github.com/user-attachments/assets/f8c99eb1-2a43-489a-ba02-6bd50a6dd260" width="100%" > </video> | -->
60+
| 拥抱 | <img src="./assets/demo/i2v_lora/imgs/embrace.png" width="40%"> | <video src="https://github.com/user-attachments/assets/f8c99eb1-2a43-489a-ba02-6bd50a6dd260" width="100%" > </video> |
6161

6262
## 📑 开源计划
6363
- HunyuanVideo-I2V(图像到视频模型)
6464
- [x] 推理代码
6565
- [x] 模型权重
6666
- [x] ComfyUI支持
67-
- [ ] LoRA训练脚本
67+
- [x] LoRA训练脚本
6868
- [ ] 多GPU序列并行推理(提升多卡推理速度)
6969
- [ ] Diffusers集成
7070
- [ ] FP8量化权重
@@ -86,6 +86,12 @@
8686
- [使用图生视频模型的建议](#使用图生视频模型的建议)
8787
- [使用命令行](#使用命令行)
8888
- [更多配置](#更多配置)
89+
- [🎉自定义 I2V LoRA 效果训练](#自定义-i2v-lora-效果训练)
90+
- [要求](#要求)
91+
- [训练环境](#训练环境)
92+
- [训练数据构建](#训练数据构建)
93+
- [开始训练](#开始训练)
94+
- [推理](#推理)
8995
- [🔗 BibTeX](#-bibtex)
9096
- [致谢](#致谢)
9197

@@ -181,23 +187,44 @@ docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyua
181187
- **避免过于详细的提示**:冗长或高度详细的提示可能会导致视频输出中出现不必要的转场。
182188

183189
### 使用命令行
184-
190+
如果想生成更**稳定**的视频,可以设置`--i2v-stability``--flow-shift 7.0`。执行命令如下
185191
```bash
186192
cd HunyuanVideo-I2V
187193

188194
python3 sample_image2video.py \
189195
--model HYVideo-T/2 \
190-
--prompt "A man with short gray hair plays a red electric guitar." \
196+
--prompt "An Asian man with short hair in black tactical uniform and white clothes waves a firework stick." \
191197
--i2v-mode \
192-
--i2v-image-path ./assets/demo/i2v/imgs/0.png \
198+
--i2v-image-path ./assets/demo/i2v/imgs/0.jpg \
193199
--i2v-resolution 720p \
200+
--i2v-stability \
201+
--infer-steps 50 \
194202
--video-length 129 \
203+
--flow-reverse \
204+
--flow-shift 7.0 \
205+
--seed 0 \
206+
--embedded-cfg-scale 6.0 \
207+
--use-cpu-offload \
208+
--save-path ./results
209+
```
210+
如果想要生成更**高动态**的视频,可以**取消设置**`--i2v-stability``--flow-shift 17.0`。执行命令如下
211+
```bash
212+
cd HunyuanVideo-I2V
213+
214+
python3 sample_image2video.py \
215+
--model HYVideo-T/2 \
216+
--prompt "An Asian man with short hair in black tactical uniform and white clothes waves a firework stick." \
217+
--i2v-mode \
218+
--i2v-image-path ./assets/demo/i2v/imgs/0.jpg \
219+
--i2v-resolution 720p \
195220
--infer-steps 50 \
221+
--video-length 129 \
196222
--flow-reverse \
197223
--flow-shift 17.0 \
224+
--embedded-cfg-scale 6.0 \
198225
--seed 0 \
199226
--use-cpu-offload \
200-
--save-path ./results
227+
--save-path ./results
201228
```
202229
<!-- # ### 运行gradio服务
203230
# ```bash
@@ -211,23 +238,24 @@ python3 sample_image2video.py \
211238

212239
我们列出了一些常用的配置以方便使用:
213240

214-
| 参数 | 默认 | 描述 |
215-
|:----------------------:|:-----------------------------:|:------------------------------------------------------------:|
216-
| `--prompt` | None | 用于视频生成的文本提示。 |
217-
| `--model` | HYVideo-T/2-cfgdistill | 这里我们使用 HYVideo-T/2 用于 I2V,HYVideo-T/2-cfgdistill 用于 T2V 模式。 |
218-
| `--i2v-mode` | False | 是否开启 I2V 模式。 |
219-
| `--i2v-image-path` | ./assets/demo/i2v/imgs/0.png | 用于视频生成的参考图像。 |
220-
| `--i2v-resolution` | 720p | 生成视频的分辨率。 |
221-
| `--video-length` | 129 | 生成视频的长度。 |
222-
| `--infer-steps` | 50 | 采样步骤的数量。 |
223-
| `--flow-shift` | 7.0 | 流匹配调度器的偏移因子。 |
224-
| `--flow-reverse` | False | 如果反转,从 t=1 学习/采样到 t=0。 |
225-
| `--seed` | None | 生成视频的随机种子,如果为 None,则初始化一个随机种子。 |
226-
| `--use-cpu-offload` | False | 使用 CPU 卸载模型加载以节省更多内存,对于高分辨率视频生成是必要的。 |
227-
| `--save-path` | ./results | 保存生成视频的路径。 |
228-
229-
230-
<!-- ## 🎉自定义 I2V LoRA 效果训练
241+
| 参数 | 默认 | 描述 |
242+
|:----------------------:|:-----------------------------:|:----------------------------------------------------------------------------------------------------------------------------------:|
243+
| `--prompt` | None | 用于视频生成的文本提示。 |
244+
| `--model` | HYVideo-T/2-cfgdistill | 这里我们使用 HYVideo-T/2 用于 I2V,HYVideo-T/2-cfgdistill 用于 T2V 模式。 |
245+
| `--i2v-mode` | False | 是否开启 I2V 模式。 |
246+
| `--i2v-image-path` | ./assets/demo/i2v/imgs/0.png | 用于视频生成的参考图像。 |
247+
| `--i2v-resolution` | 720p | 生成视频的分辨率。 |
248+
| `--i2v-stability` | False | 是否使用稳定模式进行 i2v 推理。 |
249+
| `--video-length` | 129 | 生成视频的长度。 |
250+
| `--infer-steps` | 50 | 采样步骤的数量。 |
251+
| `--flow-shift` | 7.0 | 流匹配调度器的偏移因子。我们建议将`--i2v-stability`设置为 7,以获得更稳定的视频;将`--i2v-stability`设置为 17,以获得更动态的视频 |
252+
| `--flow-reverse` | False | 如果反转,从 t=1 学习/采样到 t=0。 |
253+
| `--seed` | None | 生成视频的随机种子,如果为 None,则初始化一个随机种子。 |
254+
| `--use-cpu-offload` | False | 使用 CPU 卸载模型加载以节省更多内存,对于高分辨率视频生成是必要的。 |
255+
| `--save-path` | ./results | 保存生成视频的路径。 |
256+
257+
258+
## 🎉自定义 I2V LoRA 效果训练
231259

232260
### 要求
233261

@@ -242,7 +270,7 @@ python3 sample_image2video.py \
242270
* **最低要求**: 生成 360p 视频所需的最小 GPU 内存为 79GB。
243271
* **推荐**: 建议使用 80GB 内存的 GPU 以获得更好的生成质量。
244272
* 测试操作系统: Linux
245-
* 注意: 您可以使用 360p 数据进行训练,并直接推断 540p 视频
273+
* 注意: 您可以使用 360p 数据进行训练,并直接推理 720p 视频
246274

247275
### 训练环境
248276
```
@@ -259,6 +287,8 @@ pip install -r requirements.txt
259287

260288
### 开始训练
261289
```
290+
cd HunyuanVideo-I2V
291+
262292
sh scripts/run_train_image2video_lora.sh
263293
```
264294
我们列出了一些训练特定配置以方便使用:
@@ -268,34 +298,38 @@ sh scripts/run_train_image2video_lora.sh
268298
| `SAVE_BASE` | . | 保存实验结果的根路径。 |
269299
| `EXP_NAME` | i2v_lora | 保存实验结果的路径后缀。 |
270300
| `DATA_JSONS_DIR` | ./assets/demo/i2v_lora/train_dataset/processed_data/json_path | 由 hyvideo/hyvae_extract/start.sh 生成的数据 jsons 目录。 |
271-
| `CHIEF_IP` | 0.0.0.0 | 主节点 IP 地址。 |
301+
| `CHIEF_IP` | 127.0.0.1 | 主节点 IP 地址。 |
272302

273303
### 推理
274304
```bash
305+
cd HunyuanVideo-I2V
306+
275307
python3 sample_image2video.py \
276-
--model HYVideo-T/2 \
277-
--prompt "Two people hugged tightly, In the video, two people are standing apart from each other. They then move closer to each other and begin to hug tightly. The hug is very affectionate, with the two people holding each other tightly and looking into each other's eyes. The interaction is very emotional and heartwarming, with the two people expressing their love and affection for each other." \
278-
--i2v-mode \
279-
--i2v-image-path ./assets/demo/i2v_lora/imgs/embrace.png \
280-
--i2v-resolution 540p \
281-
--infer-steps 50 \
282-
--video-length 129 \
283-
--flow-reverse \
284-
--flow-shift 5.0 \
285-
--seed 0 \
286-
--use-cpu-offload \
287-
--save-path ./results \
288-
--use-lora \
289-
--lora-scale 1.0 \
290-
--lora-path ./ckpts/hunyuan-video-i2v-720p/lora/embrace_kohaya_weights.safetensors \
308+
--model HYVideo-T/2 \
309+
--prompt "Two people hugged tightly, In the video, two people are standing apart from each other. They then move closer to each other and begin to hug tightly. The hug is very affectionate, with the two people holding each other tightly and looking into each other's eyes. The interaction is very emotional and heartwarming, with the two people expressing their love and affection for each other." \
310+
--i2v-mode \
311+
--i2v-image-path ./assets/demo/i2v_lora/imgs/embrace.png \
312+
--i2v-resolution 720p \
313+
--i2v-stability \
314+
--infer-steps 50 \
315+
--video-length 129 \
316+
--flow-reverse \
317+
--flow-shift 5.0 \
318+
--embedded-cfg-scale 6.0 \
319+
--seed 0 \
320+
--use-cpu-offload \
321+
--save-path ./results \
322+
--use-lora \
323+
--lora-scale 1.0 \
324+
--lora-path ./ckpts/hunyuan-video-i2v-720p/lora/embrace_kohaya_weights.safetensors
291325
```
292326
我们列出了一些 LoRA 特定配置以方便使用:
293327

294328
| 参数 | 默认 | 描述 |
295329
|:-------------------:|:-------:|:----------------------------:|
296330
| `--use-lora` | None | 是否开启 LoRA 模式。 |
297331
| `--lora-scale` | 1.0 | LoRA 模型的融合比例。 |
298-
| `--lora-path` | "" | LoRA 模型的权重路径。 | -->
332+
| `--lora-path` | "" | LoRA 模型的权重路径。 |
299333

300334
## 🔗 BibTeX
301335

assets/demo/i2v_lora/imgs/embrace.png

4.68 MB
Loading
2.22 MB
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"video_path": "/path/to/video.mp4",
3+
"raw_caption": {
4+
"long caption": "Detailed description text of the video"
5+
}
6+
}

0 commit comments

Comments
 (0)