MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation

Lin, Yukang; Fung, Hokit; Xu, Jianjin; Ren, Zeping; Lau, Adela Sau Mui; Yin, Guosheng; Li, Xiu

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.48550/arXiv.2503.19383

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation

Title	MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation
Authors	Lin, Yukang Fung, Hokit Xu, Jianjin Ren, Zeping Lau, Adela Sau Mui Yin, Guosheng Li, Xiu
Issue Date	11-Jun-2025
Abstract	Recent portrait animation methods have made significant strides in generating realistic lip synchronization. However, they often lack explicit control over head movements and facial expressions, and cannot produce videos from multiple viewpoints, resulting in less controllable and expressive animations. Moreover, text-guided portrait animation remains underexplored, despite its user-friendly nature. We present a novel two-stage text-guided framework, MVPortrait (Multi-view Vivid Portrait), to generate expressive multi-view portrait animations that faithfully capture the described motion and emotion. MVPortrait is the first to introduce FLAME as an intermediate representation, effectively embedding facial movements, expressions, and view transformations within its parameter space. In the first stage, we separately train the FLAME motion and emotion diffusion models based on text input. In the second stage, we train a multi-view video generation model conditioned on a reference portrait image and multi-view FLAME rendering sequences from the first stage. Experimental results exhibit that MVPortrait outperforms existing methods in terms of motion and emotion control, as well as view consistency. Furthermore, by leveraging FLAME as a bridge, MVPortrait becomes the first controllable portrait animation framework that is compatible with text, speech, and video as driving signals.
Persistent Identifier	http://hdl.handle.net/10722/358646

DC Field	Value	Language
dc.contributor.author	Lin, Yukang	-
dc.contributor.author	Fung, Hokit	-
dc.contributor.author	Xu, Jianjin	-
dc.contributor.author	Ren, Zeping	-
dc.contributor.author	Lau, Adela Sau Mui	-
dc.contributor.author	Yin, Guosheng	-
dc.contributor.author	Li, Xiu	-
dc.date.accessioned	2025-08-13T07:47:11Z	-
dc.date.available	2025-08-13T07:47:11Z	-
dc.date.issued	2025-06-11	-
dc.identifier.uri	http://hdl.handle.net/10722/358646	-
dc.description.abstract	<p>Recent portrait animation methods have made significant strides in generating realistic lip synchronization. However, they often lack explicit control over head movements and facial expressions, and cannot produce videos from multiple viewpoints, resulting in less controllable and expressive animations. Moreover, text-guided portrait animation remains underexplored, despite its user-friendly nature. We present a novel two-stage text-guided framework, MVPortrait (Multi-view Vivid Portrait), to generate expressive multi-view portrait animations that faithfully capture the described motion and emotion. MVPortrait is the first to introduce FLAME as an intermediate representation, effectively embedding facial movements, expressions, and view transformations within its parameter space. In the first stage, we separately train the FLAME motion and emotion diffusion models based on text input. In the second stage, we train a multi-view video generation model conditioned on a reference portrait image and multi-view FLAME rendering sequences from the first stage. Experimental results exhibit that MVPortrait outperforms existing methods in terms of motion and emotion control, as well as view consistency. Furthermore, by leveraging FLAME as a bridge, MVPortrait becomes the first controllable portrait animation framework that is compatible with text, speech, and video as driving signals.<br></p>	-
dc.language	eng	-
dc.relation.ispartof	IEEE Conference on Computer Vision and Pattern Recognition 2025 (11/06/2025-15/06/2025, Nashville TN)	-
dc.title	MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation	-
dc.type	Conference_Paper	-
dc.identifier.doi	10.48550/arXiv.2503.19383	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats