Transformer-based architectures for automated annotation in 3D point clouds

Qian, Xiaoyan; 钱小燕

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Electrical & Electronic Engineering: Theses

postgraduate thesis: Transformer-based architectures for automated annotation in 3D point clouds

Title	Transformer-based architectures for automated annotation in 3D point clouds
Authors	Qian, Xiaoyan 钱小燕
Issue Date	2024
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Qian, X. [钱小燕]. (2024). Transformer-based architectures for automated annotation in 3D point clouds. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	This thesis addresses the challenges associated with annotating 3D point clouds, primarily sourced from LiDAR sensors, which are crucial for autonomous driving and robotics tasks. Current methods for 3D object detection heavily rely on manual annotations of 3D point clouds, a labor-intensive and error-prone process. To overcome these limitations, this research proposes the development of automated 3D annotation architectures that aim to achieve human-level accuracy with significantly reduced manual input. Existing approaches are usually complicated, such as pipelined training for segmentation, cylindrical object proposals, and point completion, and often fail to accurately detect hard-to-detect objects (hard samples) affected by trunca- tion, occlusion, or distance. The sparse and irregular nature of point clouds compounds these issues. In response, this thesis introduces simplified, end-to-end trainable Transformer-based architectures. This new architecture not only streamlines the annotation process but also enhances the quality of annotations by leveraging context-aware Transformer-based models, which are inherently permutation-invariant and adept at capturing dense token-wise relationships essential for understanding local and global dependencies within the data. This thesis delves into the underutilized potential of inter-object relations learning, fostering collaboration between different samples to enhance 3D annotation. By leveraging the information from well-defined and easy samples, our approach significantly improves the annotation accuracy of hard samples. This allows the model to attend to easier, shape-complete samples (easy samples), thereby enhancing the semantic understanding and overall quality of annotations for complex cases. This underscores the importance of inter-object relations in sparse point cloud processing and serves as a foundation for more efficient and accurate annotation architectures in future research. Additionally, we enhanced our annotation architecture by incorporating Implicit Neural Representations (INR) to model the intricate characteristics of point cloud objects’ surfaces through a continuous function. We introduced an attention-conditioned INR into our 3D continuous representation learning process, effectively addressing the limitations of discrete point clouds. By combining explicit discrete point cloud representations with implicit continuous representations, we significantly enhanced 3D annotation. This ensures a more robust and comprehensive 3D representation learning that relies less on specific points in the original cloud. Finally, we present an innovative architecture for multimodal information fusion, which leverages 2D data to guide 3D point cloud representation. By integrating the detailed geometrical information from 3D point clouds with the rich semantic content from 2D visual data, our approach significantly improves the semantic accuracy and quality of 3D annotations, particularly for challenging samples. This integration not only enhances the expressiveness and reliability of the annotations but also substantially reduces the need for extensive human intervention in the annotation process. This framework sets a significant advancement in multimodal fusion and offers a promising direction for future 3D annotation research. (Total words: 439)
Degree	Doctor of Philosophy
Subject	Computer vision Optical radar Three-dimensional imaging
Dept/Program	Electrical and Electronic Engineering
Persistent Identifier	http://hdl.handle.net/10722/352681

DC Field	Value	Language
dc.contributor.author	Qian, Xiaoyan	-
dc.contributor.author	钱小燕	-
dc.date.accessioned	2024-12-19T09:27:14Z	-
dc.date.available	2024-12-19T09:27:14Z	-
dc.date.issued	2024	-
dc.identifier.citation	Qian, X. [钱小燕]. (2024). Transformer-based architectures for automated annotation in 3D point clouds. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/352681	-
dc.description.abstract	This thesis addresses the challenges associated with annotating 3D point clouds, primarily sourced from LiDAR sensors, which are crucial for autonomous driving and robotics tasks. Current methods for 3D object detection heavily rely on manual annotations of 3D point clouds, a labor-intensive and error-prone process. To overcome these limitations, this research proposes the development of automated 3D annotation architectures that aim to achieve human-level accuracy with significantly reduced manual input. Existing approaches are usually complicated, such as pipelined training for segmentation, cylindrical object proposals, and point completion, and often fail to accurately detect hard-to-detect objects (hard samples) affected by trunca- tion, occlusion, or distance. The sparse and irregular nature of point clouds compounds these issues. In response, this thesis introduces simplified, end-to-end trainable Transformer-based architectures. This new architecture not only streamlines the annotation process but also enhances the quality of annotations by leveraging context-aware Transformer-based models, which are inherently permutation-invariant and adept at capturing dense token-wise relationships essential for understanding local and global dependencies within the data. This thesis delves into the underutilized potential of inter-object relations learning, fostering collaboration between different samples to enhance 3D annotation. By leveraging the information from well-defined and easy samples, our approach significantly improves the annotation accuracy of hard samples. This allows the model to attend to easier, shape-complete samples (easy samples), thereby enhancing the semantic understanding and overall quality of annotations for complex cases. This underscores the importance of inter-object relations in sparse point cloud processing and serves as a foundation for more efficient and accurate annotation architectures in future research. Additionally, we enhanced our annotation architecture by incorporating Implicit Neural Representations (INR) to model the intricate characteristics of point cloud objects’ surfaces through a continuous function. We introduced an attention-conditioned INR into our 3D continuous representation learning process, effectively addressing the limitations of discrete point clouds. By combining explicit discrete point cloud representations with implicit continuous representations, we significantly enhanced 3D annotation. This ensures a more robust and comprehensive 3D representation learning that relies less on specific points in the original cloud. Finally, we present an innovative architecture for multimodal information fusion, which leverages 2D data to guide 3D point cloud representation. By integrating the detailed geometrical information from 3D point clouds with the rich semantic content from 2D visual data, our approach significantly improves the semantic accuracy and quality of 3D annotations, particularly for challenging samples. This integration not only enhances the expressiveness and reliability of the annotations but also substantially reduces the need for extensive human intervention in the annotation process. This framework sets a significant advancement in multimodal fusion and offers a promising direction for future 3D annotation research. (Total words: 439)	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Computer vision	-
dc.subject.lcsh	Optical radar	-
dc.subject.lcsh	Three-dimensional imaging	-
dc.title	Transformer-based architectures for automated annotation in 3D point clouds	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Electrical and Electronic Engineering	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2024	-
dc.identifier.mmsid	991044891409403414	-

File Download

Supplementary

postgraduate thesis: Transformer-based architectures for automated annotation in 3D point clouds

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats