File Download
Supplementary

postgraduate thesis: Transformer-based architectures for automated annotation in 3D point clouds

TitleTransformer-based architectures for automated annotation in 3D point clouds
Authors
Issue Date2024
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Qian, X. [钱小燕]. (2024). Transformer-based architectures for automated annotation in 3D point clouds. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThis thesis addresses the challenges associated with annotating 3D point clouds, primarily sourced from LiDAR sensors, which are crucial for autonomous driving and robotics tasks. Current methods for 3D object detection heavily rely on manual annotations of 3D point clouds, a labor-intensive and error-prone process. To overcome these limitations, this research proposes the development of automated 3D annotation architectures that aim to achieve human-level accuracy with significantly reduced manual input. Existing approaches are usually complicated, such as pipelined training for segmentation, cylindrical object proposals, and point completion, and often fail to accurately detect hard-to-detect objects (hard samples) affected by trunca- tion, occlusion, or distance. The sparse and irregular nature of point clouds compounds these issues. In response, this thesis introduces simplified, end-to-end trainable Transformer-based architectures. This new architecture not only streamlines the annotation process but also enhances the quality of annotations by leveraging context-aware Transformer-based models, which are inherently permutation-invariant and adept at capturing dense token-wise relationships essential for understanding local and global dependencies within the data. This thesis delves into the underutilized potential of inter-object relations learning, fostering collaboration between different samples to enhance 3D annotation. By leveraging the information from well-defined and easy samples, our approach significantly improves the annotation accuracy of hard samples. This allows the model to attend to easier, shape-complete samples (easy samples), thereby enhancing the semantic understanding and overall quality of annotations for complex cases. This underscores the importance of inter-object relations in sparse point cloud processing and serves as a foundation for more efficient and accurate annotation architectures in future research. Additionally, we enhanced our annotation architecture by incorporating Implicit Neural Representations (INR) to model the intricate characteristics of point cloud objects’ surfaces through a continuous function. We introduced an attention-conditioned INR into our 3D continuous representation learning process, effectively addressing the limitations of discrete point clouds. By combining explicit discrete point cloud representations with implicit continuous representations, we significantly enhanced 3D annotation. This ensures a more robust and comprehensive 3D representation learning that relies less on specific points in the original cloud. Finally, we present an innovative architecture for multimodal information fusion, which leverages 2D data to guide 3D point cloud representation. By integrating the detailed geometrical information from 3D point clouds with the rich semantic content from 2D visual data, our approach significantly improves the semantic accuracy and quality of 3D annotations, particularly for challenging samples. This integration not only enhances the expressiveness and reliability of the annotations but also substantially reduces the need for extensive human intervention in the annotation process. This framework sets a significant advancement in multimodal fusion and offers a promising direction for future 3D annotation research. (Total words: 439)
DegreeDoctor of Philosophy
SubjectComputer vision
Optical radar
Three-dimensional imaging
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/352681

 

DC FieldValueLanguage
dc.contributor.authorQian, Xiaoyan-
dc.contributor.author钱小燕-
dc.date.accessioned2024-12-19T09:27:14Z-
dc.date.available2024-12-19T09:27:14Z-
dc.date.issued2024-
dc.identifier.citationQian, X. [钱小燕]. (2024). Transformer-based architectures for automated annotation in 3D point clouds. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/352681-
dc.description.abstractThis thesis addresses the challenges associated with annotating 3D point clouds, primarily sourced from LiDAR sensors, which are crucial for autonomous driving and robotics tasks. Current methods for 3D object detection heavily rely on manual annotations of 3D point clouds, a labor-intensive and error-prone process. To overcome these limitations, this research proposes the development of automated 3D annotation architectures that aim to achieve human-level accuracy with significantly reduced manual input. Existing approaches are usually complicated, such as pipelined training for segmentation, cylindrical object proposals, and point completion, and often fail to accurately detect hard-to-detect objects (hard samples) affected by trunca- tion, occlusion, or distance. The sparse and irregular nature of point clouds compounds these issues. In response, this thesis introduces simplified, end-to-end trainable Transformer-based architectures. This new architecture not only streamlines the annotation process but also enhances the quality of annotations by leveraging context-aware Transformer-based models, which are inherently permutation-invariant and adept at capturing dense token-wise relationships essential for understanding local and global dependencies within the data. This thesis delves into the underutilized potential of inter-object relations learning, fostering collaboration between different samples to enhance 3D annotation. By leveraging the information from well-defined and easy samples, our approach significantly improves the annotation accuracy of hard samples. This allows the model to attend to easier, shape-complete samples (easy samples), thereby enhancing the semantic understanding and overall quality of annotations for complex cases. This underscores the importance of inter-object relations in sparse point cloud processing and serves as a foundation for more efficient and accurate annotation architectures in future research. Additionally, we enhanced our annotation architecture by incorporating Implicit Neural Representations (INR) to model the intricate characteristics of point cloud objects’ surfaces through a continuous function. We introduced an attention-conditioned INR into our 3D continuous representation learning process, effectively addressing the limitations of discrete point clouds. By combining explicit discrete point cloud representations with implicit continuous representations, we significantly enhanced 3D annotation. This ensures a more robust and comprehensive 3D representation learning that relies less on specific points in the original cloud. Finally, we present an innovative architecture for multimodal information fusion, which leverages 2D data to guide 3D point cloud representation. By integrating the detailed geometrical information from 3D point clouds with the rich semantic content from 2D visual data, our approach significantly improves the semantic accuracy and quality of 3D annotations, particularly for challenging samples. This integration not only enhances the expressiveness and reliability of the annotations but also substantially reduces the need for extensive human intervention in the annotation process. This framework sets a significant advancement in multimodal fusion and offers a promising direction for future 3D annotation research. (Total words: 439)-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshComputer vision-
dc.subject.lcshOptical radar-
dc.subject.lcshThree-dimensional imaging-
dc.titleTransformer-based architectures for automated annotation in 3D point clouds-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2024-
dc.identifier.mmsid991044891409403414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats