File Download
Supplementary

postgraduate thesis: Thin structure reconstruction and user-controlled human video rendering

TitleThin structure reconstruction and user-controlled human video rendering
Authors
Advisors
Advisor(s):Wang, WP
Issue Date2019
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Liu Lingjie, [劉玲潔]. (2019). Thin structure reconstruction and user-controlled human video rendering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThere are a broad range of applications of 3D virtual environments in games, education, industry, AI assistants, etc. We study two aspects of modeling virtual environments in this thesis: i) digital reconstruction of 3D objects for building a virtual environment; and ii) realistically rendered and controllable animations of human characters in a virtual environment. For the past decades there has been significant research progress in reconstructing large-scale 3D environments using image-based or depth-based methods. Image-based methods, such as structure-from-motion (SfM) and multi-view stereo (MVS), produce impressive results for objects with rich texture information by identifying point correspondences across multiple images of the same 3D scene. However, their performance may degrade significantly when the number of insufficient distinct point features is insufficient. In recent years, depth-based methods have gained popularity due to their real-time performance. These methods, e.g. KinectFusion and BundleFusion, align and fuse many low-quality input frames to extract a coherent surface. These methods assume that the scanned environment consists of relatively large objects with closed or extended boundary surfaces, so that multiple depth scans can effectively be integrated using a truncated signed distance field (TSDF) discretized on a pre-defined voxel grid. Thin structures, which are commonly found in furniture design, metal sculpture, etc., are extremely difficult to reconstruct by either image-based or depth-based methods due to their unique characteristics such as lack of point features, thin elements, and severe self-occlusion. Specifically, thin structures are generally in a uniform color, so it is difficult to detect enough features on the wires for solving the key correspondence problem, which leads to degraded performance for image-based methods. On the other hand, depth-based methods simply fail to reconstruct thin structures because of the severe noise in scanned data and the relatively low resolution of the voxel-based TSDF representation with respect to thin structures. We present novel image-based and depth-based methods for reconstructing 3D thin structures. Our image-based method produces high-fidelity results using only as few as three input images. Our depth-based method uses curve skeletons as integration primitives for reconstructing very complex thin structures. The second topic in this thesis, the creation of realistically rendered and controllable animations of human characters, is another crucial task in virtual environments. Virtual actors play a key role in games and visual effects, telepresence, and VR/AR. With established human body modeling and rendering tools, it is still a challenging and time-consuming task to create video-realistic rendering of a virtual human clone that is indistinguishable from the video of a real person. To solve this problem, typically high-quality human body geometry and appearance models need to be hand-crafted or captured from real humans with sophisticated scanning setup. Recently, applying deep learning to rendering has achieved significant improvement in synthesizing faces as compared to traditional image-based rendering methods. However, synthesizing photo-realistic imagery for a full human body still remains an outstanding challenge. We present a new method for generating video-realistic animations of real humans with user control. In contrast to conventional human character rendering, we do not require a production-quality 3D model of the human, but instead use a video sequence in conjunction with a low-quality 3D template that can be easily obtained. We generate realistic videos of a real human by training a neural network to translate simple rendering of the low-quality 3D template into realistic imagery. Although the above simple translation in 2D screen space produces satisfactory results, there are still artifacts, such as over-smoothing, missing body parts, and temporal instability of fine-scale detail, e.g. pose-dependent wrinkles in the clothing. To address these limitations, we further develop a two-stage human video synthesis method that explicitly disentangles the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space to achieve much improved rendering quality.
DegreeDoctor of Philosophy
SubjectThree-dimensional imaging
Virtual humans (Artificial intelligence)
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/279343

 

DC FieldValueLanguage
dc.contributor.advisorWang, WP-
dc.contributor.authorLiu Lingjie-
dc.contributor.author劉玲潔-
dc.date.accessioned2019-10-28T03:02:24Z-
dc.date.available2019-10-28T03:02:24Z-
dc.date.issued2019-
dc.identifier.citationLiu Lingjie, [劉玲潔]. (2019). Thin structure reconstruction and user-controlled human video rendering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/279343-
dc.description.abstractThere are a broad range of applications of 3D virtual environments in games, education, industry, AI assistants, etc. We study two aspects of modeling virtual environments in this thesis: i) digital reconstruction of 3D objects for building a virtual environment; and ii) realistically rendered and controllable animations of human characters in a virtual environment. For the past decades there has been significant research progress in reconstructing large-scale 3D environments using image-based or depth-based methods. Image-based methods, such as structure-from-motion (SfM) and multi-view stereo (MVS), produce impressive results for objects with rich texture information by identifying point correspondences across multiple images of the same 3D scene. However, their performance may degrade significantly when the number of insufficient distinct point features is insufficient. In recent years, depth-based methods have gained popularity due to their real-time performance. These methods, e.g. KinectFusion and BundleFusion, align and fuse many low-quality input frames to extract a coherent surface. These methods assume that the scanned environment consists of relatively large objects with closed or extended boundary surfaces, so that multiple depth scans can effectively be integrated using a truncated signed distance field (TSDF) discretized on a pre-defined voxel grid. Thin structures, which are commonly found in furniture design, metal sculpture, etc., are extremely difficult to reconstruct by either image-based or depth-based methods due to their unique characteristics such as lack of point features, thin elements, and severe self-occlusion. Specifically, thin structures are generally in a uniform color, so it is difficult to detect enough features on the wires for solving the key correspondence problem, which leads to degraded performance for image-based methods. On the other hand, depth-based methods simply fail to reconstruct thin structures because of the severe noise in scanned data and the relatively low resolution of the voxel-based TSDF representation with respect to thin structures. We present novel image-based and depth-based methods for reconstructing 3D thin structures. Our image-based method produces high-fidelity results using only as few as three input images. Our depth-based method uses curve skeletons as integration primitives for reconstructing very complex thin structures. The second topic in this thesis, the creation of realistically rendered and controllable animations of human characters, is another crucial task in virtual environments. Virtual actors play a key role in games and visual effects, telepresence, and VR/AR. With established human body modeling and rendering tools, it is still a challenging and time-consuming task to create video-realistic rendering of a virtual human clone that is indistinguishable from the video of a real person. To solve this problem, typically high-quality human body geometry and appearance models need to be hand-crafted or captured from real humans with sophisticated scanning setup. Recently, applying deep learning to rendering has achieved significant improvement in synthesizing faces as compared to traditional image-based rendering methods. However, synthesizing photo-realistic imagery for a full human body still remains an outstanding challenge. We present a new method for generating video-realistic animations of real humans with user control. In contrast to conventional human character rendering, we do not require a production-quality 3D model of the human, but instead use a video sequence in conjunction with a low-quality 3D template that can be easily obtained. We generate realistic videos of a real human by training a neural network to translate simple rendering of the low-quality 3D template into realistic imagery. Although the above simple translation in 2D screen space produces satisfactory results, there are still artifacts, such as over-smoothing, missing body parts, and temporal instability of fine-scale detail, e.g. pose-dependent wrinkles in the clothing. To address these limitations, we further develop a two-stage human video synthesis method that explicitly disentangles the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space to achieve much improved rendering quality.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshThree-dimensional imaging-
dc.subject.lcshVirtual humans (Artificial intelligence)-
dc.titleThin structure reconstruction and user-controlled human video rendering-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2019-
dc.identifier.mmsid991044158789003414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats