3D facial generation is a critical task for immersive multimedia applications, where the key challenge lies in synthesizing vivid expression meshes with corresponding dynamical textures. Current approaches still have limitations in geometric-textural coherence and dynamic reflectance generation. To address these challenges, we present ExpDiff, a framework that generates expression meshes and dynamic BRDF textures from a single neutral-expression face. To achieve effective generation, we propose an attention-based diffusion model to learn the relationship among different expressions. To ensure correspondence between geometry and texture, we introduce a unified representation that explicitly models geometric-textural interaction, and then encodes them by models trained on large-scale datasets into a latent space to maintain generalization. To achieve semantically coherent and physically consistent generation, we propose to guide the denoising direction with specially designed textual prompts. We further construct two novel dynamic expression datasets to train the models, setting new standards for asset quality (J-Reflectance) and identity diversity (FFHQ-BRDFExp), which are publicly released to advance the community. Extensive experiments demonstrate our method's superior performance in photorealistic facial animation synthesis.
@inproceedings{cheng2025expdiff,
title={ExpDiff: Generating High-fidelity Dynamic Facial Expressions with BRDF Textures via Diffusion Model},
author={Cheng, Yuhao and Li, Xuanchen and Ren, Xingyu and Chen, Zhuo and Yang, Xiaokang and Yan, Yichao},
year={2025}
}