Skip to content

[CVPR 2025] Official Implementation of GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior

License

Notifications You must be signed in to change notification settings

silence-tang/GaussianIP

Repository files navigation

GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior

Zichen Tang1, Yuan Yao2, Miaomiao Cui2, Liefeng Bo2, Hongyu Yang1.
1Beihang University     2Alibaba Group

Text-guided 3D human generation has advanced with the development of efficient 3D representations and 2D-lifting methods like Score Distillation Sampling (SDS). However, current methods suffer from prolonged training times and often produce results that lack fine facial and garment details. In this paper, we propose GaussianIP, an effective two-stage framework for generating identity-preserving realistic 3D humans from text and image prompts. Our core insight is to leverage human-centric knowledge to facilitate the generation process. In stage 1, we propose a novel Adaptive Human Distillation Sampling (AHDS) method to rapidly generate a 3D human that maintains high identity consistency with the image prompt and achieves a realistic appearance. Compared to traditional SDS methods, AHDS better aligns with the human-centric generation process, enhancing visual quality with notably fewer training steps. To further improve the visual quality of the face and clothes regions, we design a View-Consistent Refinement (VCR) strategy in stage 2. Specifically, it produces detail-enhanced results of the multi-view images from stage 1 iteratively, ensuring the 3D texture consistency across views via mutual attention and distance-guided attention fusion. Then a polished version of the 3D human can be achieved by directly perform reconstruction with the refined images. Extensive experiments demonstrate that GaussianIP outperforms existing methods in both visual quality and training efficiency, particularly in generating identity-preserving results.

Installation

# clone the github repo
git clone https://github.com/silence-tang/GaussianIP.git
cd GaussianIP

# install torch
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

# install other dependencies
pip install -r requirements.txt

# install a modified gaussian splatting (+ depth, alpha rendering)
git clone --recursive https://github.com/ashawkey/diff-gaussian-rasterization
pip install ./diff-gaussian-rasterization

Text Prompts Gallery

The text prompts that are used for qualitative/ablation visual results demonstration are included in gallery_text_prompts.txt.

Image Prompts Gallery

The image prompts that are used for qualitative/ablation visual results demonstration are included in gallery_image_prompts.txt.

Pretrained Models

Please prepare below pre-trained models before the training process:

After the above models are downloaded, please specify their paths in configs/exp.yaml and threestudio/models/guidance/refine.py properly.

Train

First, specify TEXT_PROMPT and IMAGE_PROMPT in run.sh. Then run:

bash run.sh

Animation

Animation-Related Installation

# for GUI
pip install dearpygui

# cubvh
pip install git+https://github.com/ashawkey/cubvh

# a modified gaussian splatting (+ depth, alpha rendering)
git clone --recursive https://github.com/ashawkey/diff-gaussian-rasterization
pip install ./diff-gaussian-rasterization

# nvdiffrast
pip install git+https://github.com/NVlabs/nvdiffrast/

# kiuikit
pip install -U git+https://github.com/ashawkey/kiuikit

# smplx
pip install smplx[all]
# please also download SMPL-X files to ./smplx_models/smplx/*.pkl

Animation-Related Usage

Motions should follow the SMPL-X body pose format (21 joints), which could read by:

motion = np.load('content/amass_test_17.npz')['poses'][:, 1:22, :3]
# motion = np.load('content/Aeroplane_FW_part9.npz')['poses'][:, 1:22, :3]

Then, perform the zero-shot animating by:

# visualize in gui
python animation.py --ply <path/to/ply> --motion <path/to/motion> --gui

# play motion and save to videos/xxx.mp4
python animation.py --ply <path/to/ply> --motion <path/to/motion> --play

# also self-rotate during playing
python animation.py --ply <path/to/ply> --motion <path/to/motion> --play --rotate

Acknowledgement

This work is inspired by and builds upon numerous outstanding research efforts and open-source projects, including Threestudio, 3DGS, diff-gaussian-rasterization, HumanGaussian, IP-Adapter. We are deeply grateful to all the authors for their contributions and for sharing their work openly!

Notes

We train on the resolution of 1024x1024 with a batch size of 4. The whole optimization process takes around 40 minutes on a single NVIDIA V100 (32GB) GPU or a single NVIDIA RTX 3090 (24GB) GPU.

Citation

If you find this repository/work helpful in your research, welcome to cite the paper and give a star.

@misc{
    tang2025gaussianipidentitypreservingrealistic3d,
    title={GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior}, 
    author={Zichen Tang and Yuan Yao and Miaomiao Cui and Liefeng Bo and Hongyu Yang},
    year={2025},
    eprint={2503.11143},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2503.11143}, 
}

About

[CVPR 2025] Official Implementation of GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages