Local structure encoding and representation in 2D and 3D synthesis

Gong, Bingchen; 鞏炳辰

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Local structure encoding and representation in 2D and 3D synthesis

Title	Local structure encoding and representation in 2D and 3D synthesis
Authors	Gong, Bingchen 鞏炳辰
Advisors	Advisor(s):Yu, Y
Issue Date	2021
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Gong, B. [鞏炳辰]. (2021). Local structure encoding and representation in 2D and 3D synthesis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Generative models are one of the most promising approaches that endow computers with an understanding of the world. We train neural networks used as generative models to learn and understand the 3D environments where objects move, collide and interact. This thesis enhances the generative models with explicit local structure in two important applications: 2D single image super-resolution and 3D point cloud completion. Single image super-resolution has been a popular research topic in the last two decades and has recently received a new wave of interest due to deep neural networks. In this thesis, we approach this problem from a different perspective. With respect to a downsampled low resolution image, we model a high-resolution image as a combination of two components, a deterministic component and a stochastic component. The deterministic component can be recovered from the low-frequency signals in the downsampled image. The stochastic component, on the other hand, contains the signals that have little correlation with the low resolution image. We adopt two complementary methods for generating these two components. While generative adversarial networks are used for the stochastic component, deterministic component reconstruction is formulated as a regression problem solved using deep neural networks. Since the deterministic component exhibits clearer local orientations, we design novel loss functions tailored for such properties for training the deep regression network. These two methods are first applied to the entire input image to produce two distinct high-resolution images. Afterwards, these two images are fused together using another deep neural network that also performs local statistical rectification, which tries to make the local statistics of the fused image match the same local statistics of the ground-truth image. Quantitative results and a user study indicate that the proposed method outperforms existing state-of-the-art algorithms with a clear margin. Point completion refers to completing the missing geometries of an object from incomplete observations. Main-stream methods predict the missing shapes by decoding a global feature learned from the input point cloud, which often leads to deficient results in preserving topology consistency and surface details. In this work, we present ME-PCN, a point completion network that leverages \textbf{emptiness} in 3D shape space. Given a single depth scan, previous methods often encode the occupied partial shapes while ignoring the empty regions (e.g. holes) in depth maps. In contrast, we argue that these `emptiness' clues indicate shape boundaries that can be used to improve topology representation and detail granularity on surfaces. Specifically, our ME-PCN encodes both the occupied point cloud and the neighboring `empty points'. It estimates coarse-grained but complete and reasonable surface points in the first stage, followed by a refinement stage to produce fine-grained surface details. Comprehensive experiments verify that our ME-PCN presents better qualitative and quantitative performance against the state-of-the-art. Besides, we further prove that our `emptiness' design is lightweight and easy to embed in existing methods, which shows consistent effectiveness in improving the CD and EMD scores.
Degree	Doctor of Philosophy
Subject	Generative programming (Computer science)
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/308622

DC Field	Value	Language
dc.contributor.advisor	Yu, Y	-
dc.contributor.author	Gong, Bingchen	-
dc.contributor.author	鞏炳辰	-
dc.date.accessioned	2021-12-06T01:04:00Z	-
dc.date.available	2021-12-06T01:04:00Z	-
dc.date.issued	2021	-
dc.identifier.citation	Gong, B. [鞏炳辰]. (2021). Local structure encoding and representation in 2D and 3D synthesis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/308622	-
dc.description.abstract	Generative models are one of the most promising approaches that endow computers with an understanding of the world. We train neural networks used as generative models to learn and understand the 3D environments where objects move, collide and interact. This thesis enhances the generative models with explicit local structure in two important applications: 2D single image super-resolution and 3D point cloud completion. Single image super-resolution has been a popular research topic in the last two decades and has recently received a new wave of interest due to deep neural networks. In this thesis, we approach this problem from a different perspective. With respect to a downsampled low resolution image, we model a high-resolution image as a combination of two components, a deterministic component and a stochastic component. The deterministic component can be recovered from the low-frequency signals in the downsampled image. The stochastic component, on the other hand, contains the signals that have little correlation with the low resolution image. We adopt two complementary methods for generating these two components. While generative adversarial networks are used for the stochastic component, deterministic component reconstruction is formulated as a regression problem solved using deep neural networks. Since the deterministic component exhibits clearer local orientations, we design novel loss functions tailored for such properties for training the deep regression network. These two methods are first applied to the entire input image to produce two distinct high-resolution images. Afterwards, these two images are fused together using another deep neural network that also performs local statistical rectification, which tries to make the local statistics of the fused image match the same local statistics of the ground-truth image. Quantitative results and a user study indicate that the proposed method outperforms existing state-of-the-art algorithms with a clear margin. Point completion refers to completing the missing geometries of an object from incomplete observations. Main-stream methods predict the missing shapes by decoding a global feature learned from the input point cloud, which often leads to deficient results in preserving topology consistency and surface details. In this work, we present ME-PCN, a point completion network that leverages \textbf{emptiness} in 3D shape space. Given a single depth scan, previous methods often encode the occupied partial shapes while ignoring the empty regions (e.g. holes) in depth maps. In contrast, we argue that these `emptiness' clues indicate shape boundaries that can be used to improve topology representation and detail granularity on surfaces. Specifically, our ME-PCN encodes both the occupied point cloud and the neighboring `empty points'. It estimates coarse-grained but complete and reasonable surface points in the first stage, followed by a refinement stage to produce fine-grained surface details. Comprehensive experiments verify that our ME-PCN presents better qualitative and quantitative performance against the state-of-the-art. Besides, we further prove that our `emptiness' design is lightweight and easy to embed in existing methods, which shows consistent effectiveness in improving the CD and EMD scores.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Generative programming (Computer science)	-
dc.title	Local structure encoding and representation in 2D and 3D synthesis	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2021	-
dc.identifier.mmsid	991044448909803414	-

File Download

Supplementary

postgraduate thesis: Local structure encoding and representation in 2D and 3D synthesis

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats