UniVST: A Unified Framework for Training-free Localized Video Style Transfer [Official Code of PyTorch]
1 Key Laboratory of Multimedia Trusted Perception and Efficient Computing,
Ministry of Education of China, Xiamen University, China.
2 Kunlun Skywork AI.
Ministry of Education of China, Xiamen University, China.
2 Kunlun Skywork AI.
• 2024.10.26: 🔥 The paper of UniVST has been submitted to arXiv. • 2025.01.01: 🔥 The official code of UniVST has been released. • 2025.06.01: 🔥 The project page of UniVST is now available.
We propose UniVST, a unified framework for training-free localized video style transfer based on diffusion models. UniVST first applies DDIM inversion to the original video and style image to obtain their initial noise and integrates Point-Matching Mask Propagation to generate masks for the object regions. It then performs AdaIN-Guided Localized Video Stylization with a three-branch architecture for information interaction. Moreover, SlidingWindow Consistent Smoothing is incorporated into the denoising process, enhancing the temporal consistency in the latent space. The overall framework is illustrated as follows:
# Git clone the repo
git clone https://github.com/QuanjianSong/UniVST.git
# Installation with the requirement.txt
conda create -n UniVST python=3.9
conda activate UniVST
pip install -r requirements.txt
# Or installation with environment.yaml
conda env create -f environment.yaml
python content_ddim_inv.py --content_path ./examples/content/libby \
--output_dir ./output
Then, you will find the content inversion result in the ./output/content
.
python mask_propagation.py --feature_path ./output/features/libby/inversion_feature_301.pt \
--mask_path ./examples/mask/libby.png \
--output_dir ./output
Then, you will find the mask propagation result in the ./output/mask
.
python style_ddim_inv.py --style_path ./examples/style/style1.png \
--output_dir ./output
Then, you will find the style inversion result in the ./output/style
.
python video_style_transfer.py --inv_path ./output/content/libby/inversion \
--mask_path ./output/mask/libby \
--style_path ./output/style/style1/inversion \
--output_dir ./output
Then, you will find the edit result in the ./output/edit
.
If you find this code helpful for your research, please cite:
@article{song2024univst,
title={UniVST: A Unified Framework for Training-free Localized Video Style Transfer},
author={Song, Quanjian and Lin, Mingbao and Zhan, Wengyi and Yan, Shuicheng and Cao, Liujuan and Ji, Rongrong},
journal={arXiv preprint arXiv:2410.20084},
year={2024}
}