GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting
for Real-time Human Novel View Synthesis

CVPR 2024 Highlight


Shunyuan Zheng†,1, Boyao Zhou2, Ruizhi Shao2, Boning Liu2, Shengping Zhang✉,1,3, Liqiang Nie1, Yebin Liu2

1Harbin Institute of Technology    2Tsinghua University    3Peng Cheng Laboratory
Corresponding author     †Work done during an internship at Tsinghua University

Abstract


We present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner. The proposed method enables 2K-resolution rendering under a sparse-view camera setting. Unlike the original Gaussian Splatting or neural implicit rendering methods that necessitate per-subject optimizations, we introduce Gaussian parameter maps defined on the source views and regress directly Gaussian Splatting properties for instant novel view synthesis without any fine-tuning or optimization. To this end, we train our Gaussian parameter regression module on a large amount of human scan data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. The proposed framework is fully differentiable and experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.


Free View Rendering


Data collected by ourselves
Data from DNA-Rendering

Live Demo


Live demo for handling challenging hairstyles, human-object interaction and multi-person scenario

Method


 

Overview of GPS-Gaussian. Given RGB images of a human-centered scene with sparse camera views and a target novel viewpoint, we select the adjacent two views on which to formulate our Gaussian representation. We extract the image features followed by conducting an iterative depth estimation. For each source view, the depth map and the RGB image serve as a 3D position map and a color map, respectively, to formulate the Gaussian representation while the other parameters of 3D Gaussians are predicted in a pixel-wise manner. The Gaussian parameter maps defined on 2D image planes of both views are further unprojected to 3D space and aggregated for novel view rendering. The fully differentiable framework enables all networks a joint training mechanism.

 


Demo Video


If the video does not play, please click here to watch it.

Citation


@inproceedings{zheng2024gpsgaussian,
  title={GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis},
  author={Zheng, Shunyuan and Zhou, Boyao and Shao, Ruizhi and Liu, Boning and Zhang, Shengping and Nie, Liqiang and Liu, Yebin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}