Skip to content

Latest commit

 

History

History
130 lines (97 loc) · 4.37 KB

File metadata and controls

130 lines (97 loc) · 4.37 KB

中文阅读

HunyuanVideo Latent Feature Extraction Tool

This project provides an efficient tool for extracting latent features from videos, preparing them for subsequent video generation and processing tasks.

Features

  • Support for various video formats and resolutions
  • Multi-GPU parallel processing for improved efficiency
  • Support for multiple aspect ratios
  • High-performance VAE model for feature extraction
  • Automatic skipping of already processed videos, supporting resume functionality

Usage

1. Configuration File

Input dataset Format

The input video metadata file (meta_file.list) should be a list of JSON file paths, with each JSON file containing the following fields:

The format of meta_file.list (e.g., ./assets/demo/i2v_lora/train_dataset/meta_file.list) is as follows

/path/to/0.json
/path/to/1.json
/path/to/2.json
...

IMPORTANT: Make sure each video's video_id is unique!!!

The format of /path/to/0.json (e.g., ./assets/demo/i2v_lora/train_dataset/meta_data.json) is as follows

{
  "video_path": "/path/to/video.mp4",
  "raw_caption": {
    "long caption": "Detailed description text of the video"
  }
}

Configure parameters in hyvideo/hyvae_extract/vae.yaml:

vae_path: "./ckpts/hunyuan-video-i2v-720p/vae" # VAE model path
video_url_files: "/path/to/meta_file.list"     # Video metadata file list
output_base_dir: "/path/to/output/directory"   # Output directory
sample_n_frames: 129                           # Number of frames to sample
target_size:                                   # Target size
  - bucket_size
  - bucket_size
enable_multi_aspect_ratio: True                # Enable multiple aspect ratios
use_stride: True                               # Use stride sampling

Bucket Size Reference

The target_size parameter defines the resolution bucket size. Here are the recommended values for different quality levels:

Quality Bucket Size Typical Resolution
720p 960 1280×720 or similar
540p 720 960×540 or similar
360p 480 640×360 or similar

When enable_multi_aspect_ratio is set to True, the system will use these bucket sizes as a base to generate multiple aspect ratio buckets. For optimal performance, choose a bucket size that balances quality and memory usage based on your hardware capabilities.

2. Run Extraction

# Set environment variables
export HOST_GPU_NUM=8  # Set the number of GPUs to use

# Run extraction script
cd HunyuanVideo-I2V
bash hyvideo/hyvae_extract/start.sh

3. Single GPU Run

cd HunyuanVideo-I2V
export PYTHONPATH=${PYTHONPATH}:`pwd`
export HOST_GPU_NUM=1
CUDA_VISIBLE_DEVICES=0 python3 -u hyvideo/hyvae_extract/run.py --local_rank 0 --config 'hyvideo/hyvae_extract/vae.yaml'

Output Files

The program generates the following files in the specified output directory:

  1. {video_id}.npy - Latent feature array of the video
  2. json_path/{video_id}.json - JSON file containing video metadata, including:
    • video_id: Video ID
    • latent_shape: Shape of the latent features
    • video_path: Original video path
    • prompt: Video description/prompt
    • npy_save_path: Path where the latent features are saved
output_base_dir/
│
├── {video_id_1}.npy # Latent feature array for video 1
├── {video_id_2}.npy # Latent feature array for video 2
├── {video_id_3}.npy # Latent feature array for video 3
│ ...
├── {video_id_n}.npy # Latent feature array for video n
│
└── json_path/ # Directory containing metadata JSON files
│     ├── {video_id_1}.json # Metadata for video 1
│     ├── {video_id_2}.json # Metadata for video 2
│     ├── {video_id_3}.json # Metadata for video 3
│     │ ...
│     └── {video_id_n}.json # Metadata for video n

Advanced Configuration

Multiple Aspect Ratio Processing

When enable_multi_aspect_ratio is set to True, the system selects the target size closest to the original aspect ratio of the video, rather than forcing it to be cropped to a fixed size. This is useful for maintaining the integrity of the video content.

Stride Sampling

When use_stride is set to True, the system automatically adjusts the sampling stride based on the video's frame rate:

  • When frame rate >= 50fps, stride is 2
  • When frame rate < 50fps, stride is 1