RIGID: Recurrent GAN Inversion and Editing of Real Face Videos and Beyond

Xu, Yangyang; He, Shengfeng; Wong, Kwan-Yee K.; Luo, Ping

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/s11263-024-02329-8

Supplementary

Citations:
Appears in Collections:
- Computer Science: Journal/Magazine Articles
- HKU Musketeers Foundation Institute of Data Science: Journal/Magazine Articles

Article: RIGID: Recurrent GAN Inversion and Editing of Real Face Videos and Beyond

Title	RIGID: Recurrent GAN Inversion and Editing of Real Face Videos and Beyond
Authors	Xu, Yangyang He, Shengfeng Wong, Kwan-Yee K.Luo, Ping
Issue Date	13-Jan-2025
Citation	International Journal of Computer Vision, 2025 How to Cite? DOI: http://dx.doi.org/10.1007/s11263-024-02329-8
Abstract	GAN inversion is essential for harnessing the editability of GANs in real images, yet existing methods that invert video frames individually often yield temporally inconsistent results. To address this issue, we present a unified recurrent framework, Recurrent vIdeo GAN Inversion and eDiting (RIGID), designed to enforce temporally coherent GAN inversion and facial editing in real videos explicitly and simultaneously. Our approach models temporal relations between current and previous frames in three ways: (1) by maximizing inversion fidelity and consistency through learning a temporally compensated latent code and spatial features, (2) by disentangling high-frequency incoherent noises from the latent space, and (3) by introducing an in-between frame composition constraint to eliminate inconsistency after attribute manipulation, ensuring that each frame is a direct composite of its neighbors. Compared to existing video- and attribute-specific works, RIGID eliminates the need for expensive re-training of the model, resulting in approximately 60× faster performance. Furthermore, RIGID can be easily extended to other face domains, showcasing its versatility and adaptability. Extensive experiments demonstrate that RIGID outperforms state-of-the-art methods in inversion and editing tasks both qualitatively and quantitatively.
Persistent Identifier	http://hdl.handle.net/10722/354550

DC Field	Value	Language
dc.contributor.author	Xu, Yangyang	-
dc.contributor.author	He, Shengfeng	-
dc.contributor.author	Wong, Kwan-Yee K.	-
dc.contributor.author	Luo, Ping	-
dc.date.accessioned	2025-02-13T00:35:17Z	-
dc.date.available	2025-02-13T00:35:17Z	-
dc.date.issued	2025-01-13	-
dc.identifier.citation	International Journal of Computer Vision, 2025	-
dc.identifier.uri	http://hdl.handle.net/10722/354550	-
dc.description.abstract	<p>GAN inversion is essential for harnessing the editability of GANs in real images, yet existing methods that invert video frames individually often yield temporally inconsistent results. To address this issue, we present a unified recurrent framework, Recurrent vIdeo GAN Inversion and eDiting (RIGID), designed to enforce temporally coherent GAN inversion and facial editing in real videos explicitly and simultaneously. Our approach models temporal relations between current and previous frames in three ways: (1) by maximizing inversion fidelity and consistency through learning a temporally compensated latent code and spatial features, (2) by disentangling high-frequency incoherent noises from the latent space, and (3) by introducing an in-between frame composition constraint to eliminate inconsistency after attribute manipulation, ensuring that each frame is a direct composite of its neighbors. Compared to existing video- and attribute-specific works, RIGID eliminates the need for expensive re-training of the model, resulting in approximately 60× faster performance. Furthermore, RIGID can be easily extended to other face domains, showcasing its versatility and adaptability. Extensive experiments demonstrate that RIGID outperforms state-of-the-art methods in inversion and editing tasks both qualitatively and quantitatively.<br></p>	-
dc.language	eng	-
dc.relation.ispartof	International Journal of Computer Vision	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.title	RIGID: Recurrent GAN Inversion and Editing of Real Face Videos and Beyond	-
dc.type	Article	-
dc.identifier.doi	10.1007/s11263-024-02329-8	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: RIGID: Recurrent GAN Inversion and Editing of Real Face Videos and Beyond

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats