File Download
Supplementary

postgraduate thesis: Deep learning methods for 3D visual data processing

TitleDeep learning methods for 3D visual data processing
Authors
Issue Date2024
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Liu, C. [刘畅]. (2024). Deep learning methods for 3D visual data processing. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractOver the past decade, deep neural networks (DNNs) have made significant strides in Artificial Intelligence. In computer vision, 3D vision is essential for AI to understand the physical world. However, many technical challenges persist. This thesis presents a roadmap for advancing 3D neural networks, covering data-specific network design for event cameras, autolabeling of 3D point clouds, and a general, efficient 3D backbone network. Initially, we explore DNNs for event cameras, which capture optical changes with high temporal resolution, leading to issues like misalignment and motion blur. To address this, we propose ACE-BET, a novel architecture that converts event data into a dense tensor using heuristic methods and 3D spatial-temporal convolution. ACE-BET processes this tensor in two branches for static and dynamic tasks, achieving state-of-the-art accuracy and the fastest inference speed across all seven tasks in experiments. Recognizing that event data can be treated as a type of 3D point cloud, we then focus on general 3D point clouds. One of the major bottlenecks for 3D point cloud training is the annotation process. Annotating these clouds is time-consuming and challenging due to their sparsity. To alleviate this, we introduce MAP-Gen and MTrans, autolabelers that generate 3D annotations using multimodal inputs of images and point clouds. By leveraging dense image information, these tools estimate 3D coordinates of image pixels. Therefore, they densify the sparse point cloud, significantly improving annotation quality. MAP-Gen and MTrans outperform all previous autolabelers, achieving 94-98% of human annotation performance. Finally, we investigate a general network for understanding 3D point clouds. During the research of ACE-BET, MAP-Gen and MTrans, one major challenge is the scale of point clouds. Particularly for transformer architectures with quadratic complexity, the cloud's resolution and size are constrained, leading to information loss. Therefore, we propose ISAPnet, which uses a space tree to split 3D space based on point cloud distribution. Varying-length sequences are used to align with the floating detail levels in point clouds. Experiments show ISAPnet's efficiency, achieving comparable or 0.4% higher accuracy while using only 7% of the FLOPs of previous state-of-the-art methods.
DegreeDoctor of Philosophy
SubjectComputer vision
Three-dimensional imaging
Deep learning (Machine learning)
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/352688

 

DC FieldValueLanguage
dc.contributor.authorLiu, Chang-
dc.contributor.author刘畅-
dc.date.accessioned2024-12-19T09:27:21Z-
dc.date.available2024-12-19T09:27:21Z-
dc.date.issued2024-
dc.identifier.citationLiu, C. [刘畅]. (2024). Deep learning methods for 3D visual data processing. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/352688-
dc.description.abstractOver the past decade, deep neural networks (DNNs) have made significant strides in Artificial Intelligence. In computer vision, 3D vision is essential for AI to understand the physical world. However, many technical challenges persist. This thesis presents a roadmap for advancing 3D neural networks, covering data-specific network design for event cameras, autolabeling of 3D point clouds, and a general, efficient 3D backbone network. Initially, we explore DNNs for event cameras, which capture optical changes with high temporal resolution, leading to issues like misalignment and motion blur. To address this, we propose ACE-BET, a novel architecture that converts event data into a dense tensor using heuristic methods and 3D spatial-temporal convolution. ACE-BET processes this tensor in two branches for static and dynamic tasks, achieving state-of-the-art accuracy and the fastest inference speed across all seven tasks in experiments. Recognizing that event data can be treated as a type of 3D point cloud, we then focus on general 3D point clouds. One of the major bottlenecks for 3D point cloud training is the annotation process. Annotating these clouds is time-consuming and challenging due to their sparsity. To alleviate this, we introduce MAP-Gen and MTrans, autolabelers that generate 3D annotations using multimodal inputs of images and point clouds. By leveraging dense image information, these tools estimate 3D coordinates of image pixels. Therefore, they densify the sparse point cloud, significantly improving annotation quality. MAP-Gen and MTrans outperform all previous autolabelers, achieving 94-98% of human annotation performance. Finally, we investigate a general network for understanding 3D point clouds. During the research of ACE-BET, MAP-Gen and MTrans, one major challenge is the scale of point clouds. Particularly for transformer architectures with quadratic complexity, the cloud's resolution and size are constrained, leading to information loss. Therefore, we propose ISAPnet, which uses a space tree to split 3D space based on point cloud distribution. Varying-length sequences are used to align with the floating detail levels in point clouds. Experiments show ISAPnet's efficiency, achieving comparable or 0.4% higher accuracy while using only 7% of the FLOPs of previous state-of-the-art methods.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshComputer vision-
dc.subject.lcshThree-dimensional imaging-
dc.subject.lcshDeep learning (Machine learning)-
dc.titleDeep learning methods for 3D visual data processing-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2024-
dc.identifier.mmsid991044891408603414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats