I am currently a researcher at Stepfun from 2024. My research interests include video generation, avatar synthesis and driving, multi-modal model and representation learning.
Before joining Stepfun, I was a researcher in Xiaobing for nearly three years, work closely with Yu Deng and Baoyuan Wang. Before that, I was worked at OPPO Research Institute for three years, my research results are applied to the camera software of OPPO mobile phones as the basic face algorithm.
News
2024/08/08   Had One paper accepted by ECCV 2024 workshop EEC about agent avatar (AgentAvatar).
2024/07/01   Had One paper accepted by ECCV 2024 about 4D avatar synthesis (Portrait4D-v2).
2024/02/27   Had two papers accepted by CVPR 2024, one is about 4D avatar synthesis (Portrait4D), the other one is about unconstrained virtural try-on (PICTURE).
2023/07/14   Had one paper accepted by ICCV 2023 about talking head sythesis (TH-PAD).
2023/07/10   Our CVPR 2023 work PD-FGC has released the code and model, check it out!
2023/02/28   Had one paper accepted by CVPR 2023 about talking head sythesis (PD-FGC).
Publications
Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
Yu Deng, Duomin Wang, Baoyuan Wang
2024 European Conference on Computer Vision, ECCV 2024,
[PDF][Project][Code]
We learn a lifelike 4D head synthesizer by creating pseudo multi-view videos from monocular ones as supervision.
PICTURE: PhotorealistIC Virtual Try-on from UnconstRained dEsigns
Shuliang Ning, Duomin Wang, Yipeng Qin, Zirong Jin, Baoyuan Wang, Xiaoguang Han
2024 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2024,
[PDF][Project][Code][BibTeX]
we propose a novel virtual try-on from unconstrained designs (ucVTON) task to enable photorealistic synthesis of personalized composite clothing on input human image.
Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data
Yu Deng, Duomin Wang, Xiaohang Ren, Xingyu Chen, Baoyuan Wang
2024 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2024,
[PDF][Project][Code][BibTeX]
We propose a one-shot 4D head synthesis approach for high-fidelity 4D head avatar reconstruction while trained on large-scale synthetic data.
Disentangling Planning, Driving and Rendering for Photorealistic Avatar Agents Duomin Wang, Bin Dai, Yu Deng, Baoyuan Wang
2024 European Conference on Computer Vision, Workshop on EEC, ECCVW 2024,
[PDF][Project][Code][BibTeX]
We introduce a system that harnesses LLMs
to produce a series of detailed text descriptions of the avatar
agents’ facial motions and then pro-
cessed by our task-agnostic driving engine into motion to-
ken sequences, which are subsequently converted into con-
tinuous motion embeddings that are further consumed by
our standalone neural-based renderer to generate the fi-
nal photorealistic avatar animations.
Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors
Zhentao Yu, Zixin Yin, Deyu Zhou, Duomin Wang, Finn Wong, Baoyuan Wang
2023 IEEE International Conference on Computer Vision, ICCV 2023,
[PDF][Project][Code(coming soon)][BibTeX]
We introduce a simple and novel framework for one-shot audio-driven talking head generation. Unlike prior works that require additional driving sources for controlled synthesis in a deterministic manner, we instead probabilistically sample all the holistic lip-irrelevant facial motions (i.e. pose, expression, blink, gaze, etc.) to semantically match the input audio while still maintaining both the photo-realism of audio-lip synchronization and the overall naturalness.
Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis Duomin Wang, Yu Deng, Zixin Yin, Heung-Yeung Shum, Baoyuan Wang
2023 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2023,
[PDF][Project][Code][BibTeX]
We present a novel one-shot talking head synthesis method that achieves disentangled and fine-grained control over lip motion, eye gaze&blink, head pose, and emotional expression.
We represent different motions via disentangled latent representations and leverage an image generator to synthesize talking heads from them.