File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation

TitleLearning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
Authors
KeywordsFace and gestures
Vision + X
Issue Date2022
Citation
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, v. 2022-June, p. 10452-10462 How to Cite?
AbstractGenerating speech-consistent body and gesture movements is a long-standing problem in virtual avatar creation. Previous studies often synthesize pose movement in a holistic manner, where poses of all joints are generated simultaneously. Such a straightforward pipeline fails to generate fine-grained co-speech gestures. One observation is that the hierarchical semantics in speech and the hierarchical structures of human gestures can be naturally described into multiple granularities and associated together. To fully utilize the rich connections between speech audio and human gestures, we propose a novel framework named Hierarchical Audio-to-Gesture (HA2G) for co-speech gesture generation. In HA2G, a Hierarchical Audio Learner extracts audio representations across semantic granularities. A Hierarchical Pose Inferer subsequently renders the entire human pose gradually in a hierarchical manner. To enhance the quality of synthesized gestures, we develop a contrastive learning strategy based on audio-text alignment for better audio representations. Extensive experiments and human evaluation demonstrate that the proposed method renders realistic co-speech gestures and out-performs previous methods in a clear margin. Project page: https://alvinliu0.github.io/projects/HA2G.
Persistent Identifierhttp://hdl.handle.net/10722/352302
ISSN
2023 SCImago Journal Rankings: 10.331

 

DC FieldValueLanguage
dc.contributor.authorLiu, Xian-
dc.contributor.authorWu, Qianyi-
dc.contributor.authorZhou, Hang-
dc.contributor.authorXu, Yinghao-
dc.contributor.authorQian, Rui-
dc.contributor.authorLin, Xinyi-
dc.contributor.authorZhou, Xiaowei-
dc.contributor.authorWu, Wayne-
dc.contributor.authorDai, Bo-
dc.contributor.authorZhou, Bolei-
dc.date.accessioned2024-12-16T03:57:57Z-
dc.date.available2024-12-16T03:57:57Z-
dc.date.issued2022-
dc.identifier.citationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, v. 2022-June, p. 10452-10462-
dc.identifier.issn1063-6919-
dc.identifier.urihttp://hdl.handle.net/10722/352302-
dc.description.abstractGenerating speech-consistent body and gesture movements is a long-standing problem in virtual avatar creation. Previous studies often synthesize pose movement in a holistic manner, where poses of all joints are generated simultaneously. Such a straightforward pipeline fails to generate fine-grained co-speech gestures. One observation is that the hierarchical semantics in speech and the hierarchical structures of human gestures can be naturally described into multiple granularities and associated together. To fully utilize the rich connections between speech audio and human gestures, we propose a novel framework named Hierarchical Audio-to-Gesture (HA2G) for co-speech gesture generation. In HA2G, a Hierarchical Audio Learner extracts audio representations across semantic granularities. A Hierarchical Pose Inferer subsequently renders the entire human pose gradually in a hierarchical manner. To enhance the quality of synthesized gestures, we develop a contrastive learning strategy based on audio-text alignment for better audio representations. Extensive experiments and human evaluation demonstrate that the proposed method renders realistic co-speech gestures and out-performs previous methods in a clear margin. Project page: https://alvinliu0.github.io/projects/HA2G.-
dc.languageeng-
dc.relation.ispartofProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition-
dc.subjectFace and gestures-
dc.subjectVision + X-
dc.titleLearning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/CVPR52688.2022.01021-
dc.identifier.scopuseid_2-s2.0-85135544090-
dc.identifier.volume2022-June-
dc.identifier.spage10452-
dc.identifier.epage10462-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats