File Download
Supplementary

postgraduate thesis: Deep learning methods for human motion synthesis under diverse conditions

TitleDeep learning methods for human motion synthesis under diverse conditions
Authors
Advisors
Issue Date2024
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Wan, W. [萬威凛]. (2024). Deep learning methods for human motion synthesis under diverse conditions. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThis thesis explores novel methods for synthesizing human motion under specific conditions: environmental settings, language-based, and a combination of language and spatial inputs. The research in this thesis aims to enhance our understanding and the development of methods for observing, predicting, and generating human motion, which is crucial for practical applications in augmented and virtual reality (AR/VR), gaming, and robotics. This thesis presents a series of methodological innovations in this field. Beginning with a novel method and dataset for analyzing human-object interactions, it then delves into the generation of human motions from linguistic inputs and advances toward achieving more precise spatial control over the synthesized movements. The initial segment of the thesis focuses on predicting the future states of human motions and objects during interactions, specifically with large-sized objects commonly found in daily life. It introduces a comprehensive dataset dedicated to capturing full-body human motions interacting with a variety of large objects. It also proposes object dynamic descriptors, which encapsulate the essential dynamic properties derived from simulations. For utilizing such object dynamic information, this study also presents a novel graph neural network called HO-GCN that significantly improves prediction accuracy, demonstrating the potential to enhance human-robot collaboration and virtual reality applications. The next work presented in the thesis explores text-conditioned human motion generation in the frequency domain, which broadens the input for motion generation. This study introduces a novel approach for generating diverse and high quality human motion sequences from textual descriptions by developing a new methodology that leverages phases in the frequency domain for generating human motion sequences conditioned on textual inputs. By learning the periodic parameterized phase space and employing a conditional diffusion model, this work achieves efficient and smooth transitions between motion sequences, and extends the versatility of motion generation applications, allowing more dynamic and varied animations. The thesis also introduced "TLControl," a novel method integrating Trajectory and Language for motion synthesis. This approach synthesizes precise and realistic human motions by leveraging both trajectory and language inputs. It utilizes a part-based VQ-VAE for motion embedding, a Masked Trajectories Transformer for language conditioned initial motion prediction, and a runtime optimization strategy for handling spatial controls. TLControl allows high flexibility and accuracy in motion synthesis according to user specifications. This method not only achieves better performance across critical metrics compared to existing works but also demonstrates improved runtime efficiency. This approach marks a step forward in controllable motion generation, allowing for detailed user interaction and modification. Overall, this thesis introduces novel methodologies employing neural networks to analyze and synthesize human motion, enhancing both accuracy and flexibility in the process, promising to improve how users create character animation in virtual environments and how robots interact with people in real-world settings. It offers practical contributions to the fields of character animation and human-robot interactions.
DegreeDoctor of Philosophy
SubjectHuman mechanics - Computer simulation
Deep learning (Machine learning)
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/354718

 

DC FieldValueLanguage
dc.contributor.advisorKomura, T-
dc.contributor.advisorWang, WP-
dc.contributor.authorWan, Weilin-
dc.contributor.author萬威凛-
dc.date.accessioned2025-03-04T09:30:51Z-
dc.date.available2025-03-04T09:30:51Z-
dc.date.issued2024-
dc.identifier.citationWan, W. [萬威凛]. (2024). Deep learning methods for human motion synthesis under diverse conditions. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/354718-
dc.description.abstractThis thesis explores novel methods for synthesizing human motion under specific conditions: environmental settings, language-based, and a combination of language and spatial inputs. The research in this thesis aims to enhance our understanding and the development of methods for observing, predicting, and generating human motion, which is crucial for practical applications in augmented and virtual reality (AR/VR), gaming, and robotics. This thesis presents a series of methodological innovations in this field. Beginning with a novel method and dataset for analyzing human-object interactions, it then delves into the generation of human motions from linguistic inputs and advances toward achieving more precise spatial control over the synthesized movements. The initial segment of the thesis focuses on predicting the future states of human motions and objects during interactions, specifically with large-sized objects commonly found in daily life. It introduces a comprehensive dataset dedicated to capturing full-body human motions interacting with a variety of large objects. It also proposes object dynamic descriptors, which encapsulate the essential dynamic properties derived from simulations. For utilizing such object dynamic information, this study also presents a novel graph neural network called HO-GCN that significantly improves prediction accuracy, demonstrating the potential to enhance human-robot collaboration and virtual reality applications. The next work presented in the thesis explores text-conditioned human motion generation in the frequency domain, which broadens the input for motion generation. This study introduces a novel approach for generating diverse and high quality human motion sequences from textual descriptions by developing a new methodology that leverages phases in the frequency domain for generating human motion sequences conditioned on textual inputs. By learning the periodic parameterized phase space and employing a conditional diffusion model, this work achieves efficient and smooth transitions between motion sequences, and extends the versatility of motion generation applications, allowing more dynamic and varied animations. The thesis also introduced "TLControl," a novel method integrating Trajectory and Language for motion synthesis. This approach synthesizes precise and realistic human motions by leveraging both trajectory and language inputs. It utilizes a part-based VQ-VAE for motion embedding, a Masked Trajectories Transformer for language conditioned initial motion prediction, and a runtime optimization strategy for handling spatial controls. TLControl allows high flexibility and accuracy in motion synthesis according to user specifications. This method not only achieves better performance across critical metrics compared to existing works but also demonstrates improved runtime efficiency. This approach marks a step forward in controllable motion generation, allowing for detailed user interaction and modification. Overall, this thesis introduces novel methodologies employing neural networks to analyze and synthesize human motion, enhancing both accuracy and flexibility in the process, promising to improve how users create character animation in virtual environments and how robots interact with people in real-world settings. It offers practical contributions to the fields of character animation and human-robot interactions.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshHuman mechanics - Computer simulation-
dc.subject.lcshDeep learning (Machine learning)-
dc.titleDeep learning methods for human motion synthesis under diverse conditions-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2025-
dc.identifier.mmsid991044911103703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats