File Download
Supplementary

postgraduate thesis: Reconstructing 3D indoor scene from RGB equirectangular panorama images with convolutional neural network (CNN) system

TitleReconstructing 3D indoor scene from RGB equirectangular panorama images with convolutional neural network (CNN) system
Authors
Advisors
Advisor(s):Lau, HYKOr, KL
Issue Date2022
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Chan, C. P. [陳卓梆]. (2022). Reconstructing 3D indoor scene from RGB equirectangular panorama images with convolutional neural network (CNN) system. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractConstructing 3D virtual environments is an indispensable process in Virtual Reality application development because they define everything the VR users can see in VR worlds and can immerse players and induce the feeling of presence inside the VR environment. However, creating Virtual environments consume a lot of time and effort, even for professional 3D modelers. It is therefore important to explore methods to automate the generation of 3D scenes used in VR. Inspired by the rapid development of Deep Neural Network, specifically Convolutional Neural Network, we propose to accelerate the development cycle of virtual environment generation by a Deep neural network-based 3D reconstruction system, which takes 360 indoor RGB equirectangular panorama images as input, and outputs the generated 3D scene. Every 3D scene is enclosed with the room’s 3D layout and is populated by 3D objects that exist in the input image. To simplify the virtual environment generation problem, we divide the entire system into 4 main subtasks, namely Room Layout Estimation, Object Detection, Object Pose Estimation, and Artistic Postprocessing. In each chapter of this thesis, we introduce the background of each subtask, its related works, and how we propose to solve it. We also evaluate our approach by conducting quantitative or qualitative experiments. According to the results, most of the submodules can successfully operate on an error that is either competitive to relevant methods, or even better than existing approaches. Throughout the chapters, we also explain our secondary contributions. We propose a visual representation enhancement algorithm to a room layout estimation network. Evaluation shows our approach can improve generated layouts’ objective visual realness and optimize framerate and memory usage if the environment is rendered in Unity. Furthermore, we introduce our new object pose estimation dataset, created by taking inspiration from each of the previous related datasets and combining their unique advantages into our own one. We statistically show that our dataset is superior to existing ones in terms of image source diversity and richness of annotation features. Finally, we are going to cover the Unity-based Annotation Tool we created to accompany our dataset. It serves the purpose of reading and editing annotations and giving users the ability to annotate their custom datasets with 3DOF pose.
DegreeMaster of Philosophy
SubjectVirtual reality
Neural networks (Computer science)
Computer vision
Dept/ProgramIndustrial and Manufacturing Systems Engineering
Persistent Identifierhttp://hdl.handle.net/10722/313706

 

DC FieldValueLanguage
dc.contributor.advisorLau, HYK-
dc.contributor.advisorOr, KL-
dc.contributor.authorChan, Cheuk Pong-
dc.contributor.author陳卓梆-
dc.date.accessioned2022-06-26T09:32:36Z-
dc.date.available2022-06-26T09:32:36Z-
dc.date.issued2022-
dc.identifier.citationChan, C. P. [陳卓梆]. (2022). Reconstructing 3D indoor scene from RGB equirectangular panorama images with convolutional neural network (CNN) system. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/313706-
dc.description.abstractConstructing 3D virtual environments is an indispensable process in Virtual Reality application development because they define everything the VR users can see in VR worlds and can immerse players and induce the feeling of presence inside the VR environment. However, creating Virtual environments consume a lot of time and effort, even for professional 3D modelers. It is therefore important to explore methods to automate the generation of 3D scenes used in VR. Inspired by the rapid development of Deep Neural Network, specifically Convolutional Neural Network, we propose to accelerate the development cycle of virtual environment generation by a Deep neural network-based 3D reconstruction system, which takes 360 indoor RGB equirectangular panorama images as input, and outputs the generated 3D scene. Every 3D scene is enclosed with the room’s 3D layout and is populated by 3D objects that exist in the input image. To simplify the virtual environment generation problem, we divide the entire system into 4 main subtasks, namely Room Layout Estimation, Object Detection, Object Pose Estimation, and Artistic Postprocessing. In each chapter of this thesis, we introduce the background of each subtask, its related works, and how we propose to solve it. We also evaluate our approach by conducting quantitative or qualitative experiments. According to the results, most of the submodules can successfully operate on an error that is either competitive to relevant methods, or even better than existing approaches. Throughout the chapters, we also explain our secondary contributions. We propose a visual representation enhancement algorithm to a room layout estimation network. Evaluation shows our approach can improve generated layouts’ objective visual realness and optimize framerate and memory usage if the environment is rendered in Unity. Furthermore, we introduce our new object pose estimation dataset, created by taking inspiration from each of the previous related datasets and combining their unique advantages into our own one. We statistically show that our dataset is superior to existing ones in terms of image source diversity and richness of annotation features. Finally, we are going to cover the Unity-based Annotation Tool we created to accompany our dataset. It serves the purpose of reading and editing annotations and giving users the ability to annotate their custom datasets with 3DOF pose.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshVirtual reality-
dc.subject.lcshNeural networks (Computer science)-
dc.subject.lcshComputer vision-
dc.titleReconstructing 3D indoor scene from RGB equirectangular panorama images with convolutional neural network (CNN) system-
dc.typePG_Thesis-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplineIndustrial and Manufacturing Systems Engineering-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2022-
dc.identifier.mmsid991044545291403414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats