Thin structure reconstruction and user-controlled human video rendering

Liu Lingjie; 劉玲潔

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- Computer Science: Theses
- HKU Theses Online

postgraduate thesis: Thin structure reconstruction and user-controlled human video rendering

Title	Thin structure reconstruction and user-controlled human video rendering
Authors	Liu Lingjie 劉玲潔
Advisors	Advisor(s):Wang, WP
Issue Date	2019
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Liu Lingjie, [劉玲潔]. (2019). Thin structure reconstruction and user-controlled human video rendering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	There are a broad range of applications of 3D virtual environments in games, education, industry, AI assistants, etc. We study two aspects of modeling virtual environments in this thesis: i) digital reconstruction of 3D objects for building a virtual environment; and ii) realistically rendered and controllable animations of human characters in a virtual environment. For the past decades there has been significant research progress in reconstructing large-scale 3D environments using image-based or depth-based methods. Image-based methods, such as structure-from-motion (SfM) and multi-view stereo (MVS), produce impressive results for objects with rich texture information by identifying point correspondences across multiple images of the same 3D scene. However, their performance may degrade significantly when the number of insufficient distinct point features is insufficient. In recent years, depth-based methods have gained popularity due to their real-time performance. These methods, e.g. KinectFusion and BundleFusion, align and fuse many low-quality input frames to extract a coherent surface. These methods assume that the scanned environment consists of relatively large objects with closed or extended boundary surfaces, so that multiple depth scans can effectively be integrated using a truncated signed distance field (TSDF) discretized on a pre-defined voxel grid. Thin structures, which are commonly found in furniture design, metal sculpture, etc., are extremely difficult to reconstruct by either image-based or depth-based methods due to their unique characteristics such as lack of point features, thin elements, and severe self-occlusion. Specifically, thin structures are generally in a uniform color, so it is difficult to detect enough features on the wires for solving the key correspondence problem, which leads to degraded performance for image-based methods. On the other hand, depth-based methods simply fail to reconstruct thin structures because of the severe noise in scanned data and the relatively low resolution of the voxel-based TSDF representation with respect to thin structures. We present novel image-based and depth-based methods for reconstructing 3D thin structures. Our image-based method produces high-fidelity results using only as few as three input images. Our depth-based method uses curve skeletons as integration primitives for reconstructing very complex thin structures. The second topic in this thesis, the creation of realistically rendered and controllable animations of human characters, is another crucial task in virtual environments. Virtual actors play a key role in games and visual effects, telepresence, and VR/AR. With established human body modeling and rendering tools, it is still a challenging and time-consuming task to create video-realistic rendering of a virtual human clone that is indistinguishable from the video of a real person. To solve this problem, typically high-quality human body geometry and appearance models need to be hand-crafted or captured from real humans with sophisticated scanning setup. Recently, applying deep learning to rendering has achieved significant improvement in synthesizing faces as compared to traditional image-based rendering methods. However, synthesizing photo-realistic imagery for a full human body still remains an outstanding challenge. We present a new method for generating video-realistic animations of real humans with user control. In contrast to conventional human character rendering, we do not require a production-quality 3D model of the human, but instead use a video sequence in conjunction with a low-quality 3D template that can be easily obtained. We generate realistic videos of a real human by training a neural network to translate simple rendering of the low-quality 3D template into realistic imagery. Although the above simple translation in 2D screen space produces satisfactory results, there are still artifacts, such as over-smoothing, missing body parts, and temporal instability of fine-scale detail, e.g. pose-dependent wrinkles in the clothing. To address these limitations, we further develop a two-stage human video synthesis method that explicitly disentangles the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space to achieve much improved rendering quality.
Degree	Doctor of Philosophy
Subject	Three-dimensional imaging Virtual humans (Artificial intelligence)
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/279343

DC Field	Value	Language
dc.contributor.advisor	Wang, WP	-
dc.contributor.author	Liu Lingjie	-
dc.contributor.author	劉玲潔	-
dc.date.accessioned	2019-10-28T03:02:24Z	-
dc.date.available	2019-10-28T03:02:24Z	-
dc.date.issued	2019	-
dc.identifier.citation	Liu Lingjie, [劉玲潔]. (2019). Thin structure reconstruction and user-controlled human video rendering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/279343	-
dc.description.abstract	There are a broad range of applications of 3D virtual environments in games, education, industry, AI assistants, etc. We study two aspects of modeling virtual environments in this thesis: i) digital reconstruction of 3D objects for building a virtual environment; and ii) realistically rendered and controllable animations of human characters in a virtual environment. For the past decades there has been significant research progress in reconstructing large-scale 3D environments using image-based or depth-based methods. Image-based methods, such as structure-from-motion (SfM) and multi-view stereo (MVS), produce impressive results for objects with rich texture information by identifying point correspondences across multiple images of the same 3D scene. However, their performance may degrade significantly when the number of insufficient distinct point features is insufficient. In recent years, depth-based methods have gained popularity due to their real-time performance. These methods, e.g. KinectFusion and BundleFusion, align and fuse many low-quality input frames to extract a coherent surface. These methods assume that the scanned environment consists of relatively large objects with closed or extended boundary surfaces, so that multiple depth scans can effectively be integrated using a truncated signed distance field (TSDF) discretized on a pre-defined voxel grid. Thin structures, which are commonly found in furniture design, metal sculpture, etc., are extremely difficult to reconstruct by either image-based or depth-based methods due to their unique characteristics such as lack of point features, thin elements, and severe self-occlusion. Specifically, thin structures are generally in a uniform color, so it is difficult to detect enough features on the wires for solving the key correspondence problem, which leads to degraded performance for image-based methods. On the other hand, depth-based methods simply fail to reconstruct thin structures because of the severe noise in scanned data and the relatively low resolution of the voxel-based TSDF representation with respect to thin structures. We present novel image-based and depth-based methods for reconstructing 3D thin structures. Our image-based method produces high-fidelity results using only as few as three input images. Our depth-based method uses curve skeletons as integration primitives for reconstructing very complex thin structures. The second topic in this thesis, the creation of realistically rendered and controllable animations of human characters, is another crucial task in virtual environments. Virtual actors play a key role in games and visual effects, telepresence, and VR/AR. With established human body modeling and rendering tools, it is still a challenging and time-consuming task to create video-realistic rendering of a virtual human clone that is indistinguishable from the video of a real person. To solve this problem, typically high-quality human body geometry and appearance models need to be hand-crafted or captured from real humans with sophisticated scanning setup. Recently, applying deep learning to rendering has achieved significant improvement in synthesizing faces as compared to traditional image-based rendering methods. However, synthesizing photo-realistic imagery for a full human body still remains an outstanding challenge. We present a new method for generating video-realistic animations of real humans with user control. In contrast to conventional human character rendering, we do not require a production-quality 3D model of the human, but instead use a video sequence in conjunction with a low-quality 3D template that can be easily obtained. We generate realistic videos of a real human by training a neural network to translate simple rendering of the low-quality 3D template into realistic imagery. Although the above simple translation in 2D screen space produces satisfactory results, there are still artifacts, such as over-smoothing, missing body parts, and temporal instability of fine-scale detail, e.g. pose-dependent wrinkles in the clothing. To address these limitations, we further develop a two-stage human video synthesis method that explicitly disentangles the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space to achieve much improved rendering quality.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Three-dimensional imaging	-
dc.subject.lcsh	Virtual humans (Artificial intelligence)	-
dc.title	Thin structure reconstruction and user-controlled human video rendering	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2019	-
dc.identifier.mmsid	991044158789003414	-

File Download

Supplementary

postgraduate thesis: Thin structure reconstruction and user-controlled human video rendering

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats