Reconstructing 3D indoor scene from RGB equirectangular panorama images with convolutional neural network (CNN) system

Chan, Cheuk Pong; 陳卓梆

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Industrial & Manufacturing Systems Engineering: Theses

postgraduate thesis: Reconstructing 3D indoor scene from RGB equirectangular panorama images with convolutional neural network (CNN) system

Title	Reconstructing 3D indoor scene from RGB equirectangular panorama images with convolutional neural network (CNN) system
Authors	Chan, Cheuk Pong 陳卓梆
Advisors	Advisor(s):Lau, HYK Or, KL
Issue Date	2022
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Chan, C. P. [陳卓梆]. (2022). Reconstructing 3D indoor scene from RGB equirectangular panorama images with convolutional neural network (CNN) system. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Constructing 3D virtual environments is an indispensable process in Virtual Reality application development because they define everything the VR users can see in VR worlds and can immerse players and induce the feeling of presence inside the VR environment. However, creating Virtual environments consume a lot of time and effort, even for professional 3D modelers. It is therefore important to explore methods to automate the generation of 3D scenes used in VR. Inspired by the rapid development of Deep Neural Network, specifically Convolutional Neural Network, we propose to accelerate the development cycle of virtual environment generation by a Deep neural network-based 3D reconstruction system, which takes 360 indoor RGB equirectangular panorama images as input, and outputs the generated 3D scene. Every 3D scene is enclosed with the room’s 3D layout and is populated by 3D objects that exist in the input image. To simplify the virtual environment generation problem, we divide the entire system into 4 main subtasks, namely Room Layout Estimation, Object Detection, Object Pose Estimation, and Artistic Postprocessing. In each chapter of this thesis, we introduce the background of each subtask, its related works, and how we propose to solve it. We also evaluate our approach by conducting quantitative or qualitative experiments. According to the results, most of the submodules can successfully operate on an error that is either competitive to relevant methods, or even better than existing approaches. Throughout the chapters, we also explain our secondary contributions. We propose a visual representation enhancement algorithm to a room layout estimation network. Evaluation shows our approach can improve generated layouts’ objective visual realness and optimize framerate and memory usage if the environment is rendered in Unity. Furthermore, we introduce our new object pose estimation dataset, created by taking inspiration from each of the previous related datasets and combining their unique advantages into our own one. We statistically show that our dataset is superior to existing ones in terms of image source diversity and richness of annotation features. Finally, we are going to cover the Unity-based Annotation Tool we created to accompany our dataset. It serves the purpose of reading and editing annotations and giving users the ability to annotate their custom datasets with 3DOF pose.
Degree	Master of Philosophy
Subject	Virtual reality Neural networks (Computer science) Computer vision
Dept/Program	Industrial and Manufacturing Systems Engineering
Persistent Identifier	http://hdl.handle.net/10722/313706

DC Field	Value	Language
dc.contributor.advisor	Lau, HYK	-
dc.contributor.advisor	Or, KL	-
dc.contributor.author	Chan, Cheuk Pong	-
dc.contributor.author	陳卓梆	-
dc.date.accessioned	2022-06-26T09:32:36Z	-
dc.date.available	2022-06-26T09:32:36Z	-
dc.date.issued	2022	-
dc.identifier.citation	Chan, C. P. [陳卓梆]. (2022). Reconstructing 3D indoor scene from RGB equirectangular panorama images with convolutional neural network (CNN) system. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/313706	-
dc.description.abstract	Constructing 3D virtual environments is an indispensable process in Virtual Reality application development because they define everything the VR users can see in VR worlds and can immerse players and induce the feeling of presence inside the VR environment. However, creating Virtual environments consume a lot of time and effort, even for professional 3D modelers. It is therefore important to explore methods to automate the generation of 3D scenes used in VR. Inspired by the rapid development of Deep Neural Network, specifically Convolutional Neural Network, we propose to accelerate the development cycle of virtual environment generation by a Deep neural network-based 3D reconstruction system, which takes 360 indoor RGB equirectangular panorama images as input, and outputs the generated 3D scene. Every 3D scene is enclosed with the room’s 3D layout and is populated by 3D objects that exist in the input image. To simplify the virtual environment generation problem, we divide the entire system into 4 main subtasks, namely Room Layout Estimation, Object Detection, Object Pose Estimation, and Artistic Postprocessing. In each chapter of this thesis, we introduce the background of each subtask, its related works, and how we propose to solve it. We also evaluate our approach by conducting quantitative or qualitative experiments. According to the results, most of the submodules can successfully operate on an error that is either competitive to relevant methods, or even better than existing approaches. Throughout the chapters, we also explain our secondary contributions. We propose a visual representation enhancement algorithm to a room layout estimation network. Evaluation shows our approach can improve generated layouts’ objective visual realness and optimize framerate and memory usage if the environment is rendered in Unity. Furthermore, we introduce our new object pose estimation dataset, created by taking inspiration from each of the previous related datasets and combining their unique advantages into our own one. We statistically show that our dataset is superior to existing ones in terms of image source diversity and richness of annotation features. Finally, we are going to cover the Unity-based Annotation Tool we created to accompany our dataset. It serves the purpose of reading and editing annotations and giving users the ability to annotate their custom datasets with 3DOF pose.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Virtual reality	-
dc.subject.lcsh	Neural networks (Computer science)	-
dc.subject.lcsh	Computer vision	-
dc.title	Reconstructing 3D indoor scene from RGB equirectangular panorama images with convolutional neural network (CNN) system	-
dc.type	PG_Thesis	-
dc.description.thesisname	Master of Philosophy	-
dc.description.thesislevel	Master	-
dc.description.thesisdiscipline	Industrial and Manufacturing Systems Engineering	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2022	-
dc.identifier.mmsid	991044545291403414	-

File Download

Supplementary

postgraduate thesis: Reconstructing 3D indoor scene from RGB equirectangular panorama images with convolutional neural network (CNN) system

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats