[LoRA] Support Wan #10943

a-r-r-o-w · 2025-03-03T06:55:23Z

Dummy model: https://huggingface.co/finetrainers/Wan2.1-T2V-1.3B-crush-smol-v0

cc @yiyixuxu for pipeline related changes

HuggingFaceDocBuilderDev · 2025-03-03T07:03:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w · 2025-03-03T09:32:27Z

Image-to-Video works as expected after the changes to the pipeline. Most of the changes are to address consistency across implementations and fixes to default values.

import torch
from diffusers import AutoencoderKLWan, WanImageToVideoPipeline, UniPCMultistepScheduler
from diffusers.utils import export_to_video, load_image

# Available models: Wan-AI/Wan2.1-I2V-14B-480P, Wan-AI/Wan2.1-I2V-1.3B-720P
model_id = "Wan-AI/Wan2.1-I2V-14B-480P-Diffusers"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
scheduler = UniPCMultistepScheduler(prediction_type="flow_prediction", use_flow_sigmas=True, flow_shift=6.0)
pipe = WanImageToVideoPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)
pipe.scheduler = scheduler
pipe.to("cuda")

# height, width = 480, 832
height, width = 480, 704
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg")
prompt = (
    "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in "
    "the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
)
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"

output = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=height,
    width=width,
    num_frames=81,
    num_inference_steps=30,
    guidance_scale=3.0
).frames[0]
export_to_video(output, "output2.mp4", fps=15)

output2.mp4

update

87b4b9e

a-r-r-o-w added 2 commits March 3, 2025 08:36

refactor image-to-video pipeline

20d738c

update

c089372

fix copied from

ea1352c

a-r-r-o-w added the roadmap Add to current release roadmap label Mar 3, 2025

a-r-r-o-w requested review from yiyixuxu and sayakpaul March 3, 2025 21:08

a-r-r-o-w marked this pull request as ready for review March 3, 2025 21:09

Merge branch 'main' into lora/wan

ca8afba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoRA] Support Wan #10943

[LoRA] Support Wan #10943

a-r-r-o-w commented Mar 3, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 3, 2025

a-r-r-o-w commented Mar 3, 2025

[LoRA] Support Wan #10943

Are you sure you want to change the base?

[LoRA] Support Wan #10943

Conversation

a-r-r-o-w commented Mar 3, 2025 • edited Loading

HuggingFaceDocBuilderDev commented Mar 3, 2025

a-r-r-o-w commented Mar 3, 2025

a-r-r-o-w commented Mar 3, 2025 •

edited

Loading