File Download
Supplementary

postgraduate thesis: Local structure encoding and representation in 2D and 3D synthesis

TitleLocal structure encoding and representation in 2D and 3D synthesis
Authors
Advisors
Advisor(s):Yu, Y
Issue Date2021
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Gong, B. [鞏炳辰]. (2021). Local structure encoding and representation in 2D and 3D synthesis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractGenerative models are one of the most promising approaches that endow computers with an understanding of the world. We train neural networks used as generative models to learn and understand the 3D environments where objects move, collide and interact. This thesis enhances the generative models with explicit local structure in two important applications: 2D single image super-resolution and 3D point cloud completion. Single image super-resolution has been a popular research topic in the last two decades and has recently received a new wave of interest due to deep neural networks. In this thesis, we approach this problem from a different perspective. With respect to a downsampled low resolution image, we model a high-resolution image as a combination of two components, a deterministic component and a stochastic component. The deterministic component can be recovered from the low-frequency signals in the downsampled image. The stochastic component, on the other hand, contains the signals that have little correlation with the low resolution image. We adopt two complementary methods for generating these two components. While generative adversarial networks are used for the stochastic component, deterministic component reconstruction is formulated as a regression problem solved using deep neural networks. Since the deterministic component exhibits clearer local orientations, we design novel loss functions tailored for such properties for training the deep regression network. These two methods are first applied to the entire input image to produce two distinct high-resolution images. Afterwards, these two images are fused together using another deep neural network that also performs local statistical rectification, which tries to make the local statistics of the fused image match the same local statistics of the ground-truth image. Quantitative results and a user study indicate that the proposed method outperforms existing state-of-the-art algorithms with a clear margin. Point completion refers to completing the missing geometries of an object from incomplete observations. Main-stream methods predict the missing shapes by decoding a global feature learned from the input point cloud, which often leads to deficient results in preserving topology consistency and surface details. In this work, we present ME-PCN, a point completion network that leverages \textbf{emptiness} in 3D shape space. Given a single depth scan, previous methods often encode the occupied partial shapes while ignoring the empty regions (e.g. holes) in depth maps. In contrast, we argue that these `emptiness' clues indicate shape boundaries that can be used to improve topology representation and detail granularity on surfaces. Specifically, our ME-PCN encodes both the occupied point cloud and the neighboring `empty points'. It estimates coarse-grained but complete and reasonable surface points in the first stage, followed by a refinement stage to produce fine-grained surface details. Comprehensive experiments verify that our ME-PCN presents better qualitative and quantitative performance against the state-of-the-art. Besides, we further prove that our `emptiness' design is lightweight and easy to embed in existing methods, which shows consistent effectiveness in improving the CD and EMD scores.
DegreeDoctor of Philosophy
SubjectGenerative programming (Computer science)
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/308622

 

DC FieldValueLanguage
dc.contributor.advisorYu, Y-
dc.contributor.authorGong, Bingchen-
dc.contributor.author鞏炳辰-
dc.date.accessioned2021-12-06T01:04:00Z-
dc.date.available2021-12-06T01:04:00Z-
dc.date.issued2021-
dc.identifier.citationGong, B. [鞏炳辰]. (2021). Local structure encoding and representation in 2D and 3D synthesis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/308622-
dc.description.abstractGenerative models are one of the most promising approaches that endow computers with an understanding of the world. We train neural networks used as generative models to learn and understand the 3D environments where objects move, collide and interact. This thesis enhances the generative models with explicit local structure in two important applications: 2D single image super-resolution and 3D point cloud completion. Single image super-resolution has been a popular research topic in the last two decades and has recently received a new wave of interest due to deep neural networks. In this thesis, we approach this problem from a different perspective. With respect to a downsampled low resolution image, we model a high-resolution image as a combination of two components, a deterministic component and a stochastic component. The deterministic component can be recovered from the low-frequency signals in the downsampled image. The stochastic component, on the other hand, contains the signals that have little correlation with the low resolution image. We adopt two complementary methods for generating these two components. While generative adversarial networks are used for the stochastic component, deterministic component reconstruction is formulated as a regression problem solved using deep neural networks. Since the deterministic component exhibits clearer local orientations, we design novel loss functions tailored for such properties for training the deep regression network. These two methods are first applied to the entire input image to produce two distinct high-resolution images. Afterwards, these two images are fused together using another deep neural network that also performs local statistical rectification, which tries to make the local statistics of the fused image match the same local statistics of the ground-truth image. Quantitative results and a user study indicate that the proposed method outperforms existing state-of-the-art algorithms with a clear margin. Point completion refers to completing the missing geometries of an object from incomplete observations. Main-stream methods predict the missing shapes by decoding a global feature learned from the input point cloud, which often leads to deficient results in preserving topology consistency and surface details. In this work, we present ME-PCN, a point completion network that leverages \textbf{emptiness} in 3D shape space. Given a single depth scan, previous methods often encode the occupied partial shapes while ignoring the empty regions (e.g. holes) in depth maps. In contrast, we argue that these `emptiness' clues indicate shape boundaries that can be used to improve topology representation and detail granularity on surfaces. Specifically, our ME-PCN encodes both the occupied point cloud and the neighboring `empty points'. It estimates coarse-grained but complete and reasonable surface points in the first stage, followed by a refinement stage to produce fine-grained surface details. Comprehensive experiments verify that our ME-PCN presents better qualitative and quantitative performance against the state-of-the-art. Besides, we further prove that our `emptiness' design is lightweight and easy to embed in existing methods, which shows consistent effectiveness in improving the CD and EMD scores.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshGenerative programming (Computer science)-
dc.titleLocal structure encoding and representation in 2D and 3D synthesis-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2021-
dc.identifier.mmsid991044448909803414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats