Topo4D++: Realistic Physical-based 4D Head Capture with Topology-Preserving Gaussian Splatting and Expression Priors

Yuhao Cheng1, Xuanchen Li1, Xingyu Ren1, Haoyu Wang1, Haozhe Jia2, Di Xu2, Wenhan Zhu3,
Bingbing Ni1, Yichao Yan1†
2Huawei Cloud Computing Technologies Co., Ltd
(Corresponding author)
Teaser

Example results of our Topo4D++. Our method can produce temporal-consistent topological head meshes with high-fidelity 8K textures from calibrated multi-view videos. Captured 4D models can be applied to retargeting and relighting applications.

Abstract

Recent significant advances have been made in high-quality 3D face reconstruction, but challenges remain in 4D face asset acquirement. 4D head capture aims to generate dynamic meshes in the same topology with corresponding UV maps from multi-view videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic wrinkles in pore squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment for topological head registration. However, this kind of approach is prone to errors and heavily relies on artists' time-consuming manual processing. To simplify this process, we propose Topo4D++, a novel framework for automatic geometry and texture generation that optimizes densely aligned 4D heads and 8K BRDF maps directly from calibrated multi-view videos. Specifically, we first represent the facial models as a set of dynamic 3D Gaussians with fixed topology in which the Gaussian centers are bound to the mesh vertices. Afterward, we optimize geometry and texture frame-by-frame alternatively for dynamic head capture while maintaining temporal topology stability. Expression priors are employed in the process to achieve better initialization for extreme expressions. Then, we can extract dynamic facial meshes in regular wiring arrangement and high-fidelity textures with pore-level details from the learned Gaussians, where displacement maps are also obtained for detailed geometry reconstruction. Finally, we can obtain BRDF texture maps for physical-based rendering results under different light conditions. We also propose a novel and comprehensive benchmark JHead to provide a more effective evaluation of 4D head capture methods. Extensive experiments show that our method achieves superior results than the current SOTA face reconstruction methods in the quality of both meshes and textures.

Pipeline

Pipeline
Overall pipeline of our Topo4D++ framework. (a) We initialize Gaussian attributes and establish topological correspondence with the startup mesh. (b) Take one frame as an example, the Gaussian meshes are first initialized with AU-based expression priors. Then geometry-related attributes in the Gaussian Mesh of the last frame are optimized by this frame under a set of topology-aware loss items. (c) We align the Gaussian surface with the rendering surface by Gaussian normal expansion to extract more precise meshes. (d) To learn pore-level ultra-high resolution texture and displacement maps, we build a dense mesh by densifying Gaussians in UV space. (e) Finally, we can employ our conversion model to generate albedo maps and specular maps for physical-based rendering.

Results

Reconstruction Results

Topo4D can generate dynamic temporal-consistent meshes and corresponding 8K texture maps with pore-level details from calibrated multi-view videos.

Topo4D can capture subtle facial changes and various extreme expressions, representing muscle tremors and dynamic wrinkles.

Comparisons

Geometry Comparison

Texture Comparison

Comparison on Multiface

BibTeX


@inproceedings{cheng2025topo4d++,
  title={Topo4D++: Realistic Physical-based 4D Head Capture with Topology-Preserving Gaussian Splatting and Expression Priors},
  author={Cheng, Yuhao and Li, Xuanchen and Ren, Xingyu and Wang, Jianyu and Jia, Haozhe and Xu, Di and Zhu, Wenhan and Ni, Bingbing and Yan, Yichao},
  year={2025}
}