You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+13-19
Original file line number
Diff line number
Diff line change
@@ -42,6 +42,7 @@ This repo contains offical PyTorch model definitions, pre-trained weights and in
42
42
43
43
44
44
## 🔥🔥🔥 News!!
45
+
* Mar 07, 2025: 🔥 We have fixed the bug in our open-source version that caused ID changes. Please try the new model weights of [HunyuanVideo-I2V](https://huggingface.co/tencent/HunyuanVideo-I2V) to ensure full visual consistency in the first frame and produce higher quality videos.
45
46
* Mar 06, 2025: 👋 We release the inference code and model weights of HunyuanVideo-I2V. [Download](https://github.com/Tencent/HunyuanVideo-I2V/blob/main/ckpts/README.md).
46
47
47
48
@@ -53,7 +54,12 @@ This repo contains offical PyTorch model definitions, pre-trained weights and in
53
54
<p>Co-creator @D-aiY Director Ding Yi</p>
54
55
</div>
55
56
56
-
### Customizable I2V LoRA Demo
57
+
### Frist Frame Consistency Demo
58
+
| Reference Image | Generated Video |Reference Image | Generated Video |Reference Image | Generated Video |
@@ -74,16 +80,16 @@ This repo contains offical PyTorch model definitions, pre-trained weights and in
74
80
- Enhance-A-Video (Better Generated Video for Free): [Enhance-A-Video](https://github.com/NUS-HPC-AI-Lab/Enhance-A-Video) by [NUS-HPC-AI-Lab](https://ai.comp.nus.edu.sg/)
75
81
- TeaCache (Cache-based Accelerate): [TeaCache](https://github.com/LiewFeng/TeaCache) by [Feng Liu](https://github.com/LiewFeng)
76
82
- HunyuanVideoGP (GPU Poor version): [HunyuanVideoGP](https://github.com/deepbeepmeep/HunyuanVideoGP) by [DeepBeepMeep](https://github.com/deepbeepmeep)
77
-
-->
83
+
-->
78
84
79
85
80
86
81
87
## 📑 Open-source Plan
82
88
- HunyuanVideo-I2V (Image-to-Video Model)
83
-
-[x] Lora training scripts
84
89
-[x] Inference
85
90
-[x] Checkpoints
86
91
-[x] ComfyUI
92
+
-[ ] Lora training scripts
87
93
-[ ] Multi-gpus Sequence Parallel inference (Faster inference speed on more gpus)
88
94
-[ ] Diffusers
89
95
-[ ] FP8 Quantified weight
@@ -93,7 +99,7 @@ This repo contains offical PyTorch model definitions, pre-trained weights and in
-[Training data construction](#training-data-construction)
112
-
-[Training](#training)
113
-
-[Inference](#inference)
114
114
-[🔗 BibTeX](#-bibtex)
115
115
-[Acknowledgements](#acknowledgements)
116
116
---
117
117
118
118
## **HunyuanVideo-I2V Overall Architecture**
119
-
Leveraging the advanced video generation capabilities of [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), we have extended its application to image-to-video generation tasks. To achieve this, we employ an image latent concatenation technique to effectively reconstruct and incorporate reference image information into the video generation process.
119
+
Leveraging the advanced video generation capabilities of [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), we have extended its application to image-to-video generation tasks. To achieve this, we employ a token replace technique to effectively reconstruct and incorporate reference image information into the video generation process.
120
120
121
121
Since we utilizes a pre-trained Multimodal Large Language Model (MLLM) with a Decoder-Only architecture as the text encoder, we can significantly enhance the model's ability to comprehend the semantic content of the input image and to seamlessly integrate information from both the image and its associated caption. Specifically, the input image is processed by the MLLM to generate semantic image tokens. These tokens are then concatenated with the video latent tokens, enabling comprehensive full-attention computation across the combined data.
122
122
@@ -212,12 +212,6 @@ Similar to [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), HunyuanVideo
212
212
-**Camera Angle (Optional)**: Indicate the perspective or viewpoint.
213
213
-**Avoid Overly Detailed Prompts**: Lengthy or highly detailed prompts can lead to unnecessary transitions in the video output.
214
214
215
-
For example:
216
-
1. A man with short gray hair plays a red electric guitar.
217
-
2. A woman sits on a wooden floor, holding a colorful bag.
218
-
3. A bee flaps its wings. The camera movement is Zoom Out/Zoom In/Pan Right.
219
-
4. A little boy closes his mouth, stands up, and lifts his left hand. The background is blurred.
220
-
221
215
<!-- **For image-to-video models, we recommend using concise prompts to guide the model's generation process. A good prompt should include elements such as background, main subject, action, and camera angle. Overly long or excessively detailed prompts may introduce unnecessary transitions.** -->
222
216
223
217
### Using Command Line
@@ -266,7 +260,7 @@ We list some more useful configurations for easy usage:
266
260
|`--save-path`| ./results | Path to save the generated video. |
267
261
268
262
269
-
## 🎉 Customizable I2V LoRA effects training
263
+
<!--## 🎉 Customizable I2V LoRA effects training
270
264
271
265
### Requirements
272
266
@@ -336,7 +330,7 @@ We list some lora specific configurations for easy usage:
0 commit comments