File Download
Supplementary

postgraduate thesis: Improving depth estimation through generalization enhancement in deep learning

TitleImproving depth estimation through generalization enhancement in deep learning
Authors
Advisors
Issue Date2024
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zhang, Z. [張澤正]. (2024). Improving depth estimation through generalization enhancement in deep learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractDepth perception plays a critical role in human interaction with the external world, enabling navigation, distance estimation, and object interaction. As applications that mimic human behaviors, such as autonomous driving, simultaneous localization and mapping (SLAM), and augmented reality (AR), rapidly advance, depth perception research has become increasingly important. Additionally, addressing the defocus problem in biomedical images necessitates advancements in depth perception techniques, such as in the area of defocusing distance prediction. Data-driven methods, especially deep learning, have played a crucial role in the advancement of depth estimation by capturing complex relationships between images and corresponding depth information. However, existing methods still have room for improvement, particularly in terms of the generalization of deep learning models to unseen data. In this thesis, we focus on enhancing the generalization of depth estimation in several directions. First, for whole slide imaging (WSI) used in pathological diagnosis, where high-resolution images are difficult to capture clearly, deep learning based defocusing distance prediction are struggled, especially on those collected using different protocols from training dataset. We propose quantized spatial phase modulation as a preprocessing method, which involves applying a quantized phase mask to the Fourier domain of input images and using the postprocessed image, along with its Fourier amplitude and Fourier phase as input for the neural network. This helps to generate more distinctiveness for images with similar defocusing distance, leading to improved generalization performance. Next, for macroscopic images like outdoor driving scenes, which involve complex changes in external conditions, addressing the domain generalization problem becomes more crucial. One challenge is to perform accurate depth estimation on both daytime and nighttime conditions, in order to meet the demand for all-day driving. We introduce a two-branch network that incorporates both CNN and Transformer as encoders. By leveraging the complementary strengths of the two encoders and utilizing pretrained CycleGAN to transfer images between domains, the representational ability of neural network is highly enhanced, enabling accurate depth estimation for both daytime and nighttime images within a single model. When two images are available for input, stereo matching is applied as a more accurate way for depth estimation. To address the problem of performance degradation in stereo matching on unseen domains, we propose a domain generalization method universal to most stereo matching networks. A view consistency constraint is applied on the left and right features before constructing the cost volume. Additionally, the disparity refinement is performed based on the cosine similarity map and shallower features of stereo matching network. The application of this method demonstrates significant generalization performance enhancements over several baseline models. Through these proposed techniques, our research aims to enhance the generalization of depth estimation in various sub-directions, including micro and macro images, as well as supervised and self-supervised approaches. The outcomes of this research have the potential to enhance the quality and fidelity of depth perception, contributing to advancements in perceiving the 3D world from 2D images.
DegreeDoctor of Philosophy
SubjectDepth perception
Deep learning (Machine learning)
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/350309

 

DC FieldValueLanguage
dc.contributor.advisorWong, KKY-
dc.contributor.advisorTsia, KKM-
dc.contributor.authorZhang, Zezheng-
dc.contributor.author張澤正-
dc.date.accessioned2024-10-23T09:46:04Z-
dc.date.available2024-10-23T09:46:04Z-
dc.date.issued2024-
dc.identifier.citationZhang, Z. [張澤正]. (2024). Improving depth estimation through generalization enhancement in deep learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/350309-
dc.description.abstractDepth perception plays a critical role in human interaction with the external world, enabling navigation, distance estimation, and object interaction. As applications that mimic human behaviors, such as autonomous driving, simultaneous localization and mapping (SLAM), and augmented reality (AR), rapidly advance, depth perception research has become increasingly important. Additionally, addressing the defocus problem in biomedical images necessitates advancements in depth perception techniques, such as in the area of defocusing distance prediction. Data-driven methods, especially deep learning, have played a crucial role in the advancement of depth estimation by capturing complex relationships between images and corresponding depth information. However, existing methods still have room for improvement, particularly in terms of the generalization of deep learning models to unseen data. In this thesis, we focus on enhancing the generalization of depth estimation in several directions. First, for whole slide imaging (WSI) used in pathological diagnosis, where high-resolution images are difficult to capture clearly, deep learning based defocusing distance prediction are struggled, especially on those collected using different protocols from training dataset. We propose quantized spatial phase modulation as a preprocessing method, which involves applying a quantized phase mask to the Fourier domain of input images and using the postprocessed image, along with its Fourier amplitude and Fourier phase as input for the neural network. This helps to generate more distinctiveness for images with similar defocusing distance, leading to improved generalization performance. Next, for macroscopic images like outdoor driving scenes, which involve complex changes in external conditions, addressing the domain generalization problem becomes more crucial. One challenge is to perform accurate depth estimation on both daytime and nighttime conditions, in order to meet the demand for all-day driving. We introduce a two-branch network that incorporates both CNN and Transformer as encoders. By leveraging the complementary strengths of the two encoders and utilizing pretrained CycleGAN to transfer images between domains, the representational ability of neural network is highly enhanced, enabling accurate depth estimation for both daytime and nighttime images within a single model. When two images are available for input, stereo matching is applied as a more accurate way for depth estimation. To address the problem of performance degradation in stereo matching on unseen domains, we propose a domain generalization method universal to most stereo matching networks. A view consistency constraint is applied on the left and right features before constructing the cost volume. Additionally, the disparity refinement is performed based on the cosine similarity map and shallower features of stereo matching network. The application of this method demonstrates significant generalization performance enhancements over several baseline models. Through these proposed techniques, our research aims to enhance the generalization of depth estimation in various sub-directions, including micro and macro images, as well as supervised and self-supervised approaches. The outcomes of this research have the potential to enhance the quality and fidelity of depth perception, contributing to advancements in perceiving the 3D world from 2D images.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshDepth perception-
dc.subject.lcshDeep learning (Machine learning)-
dc.titleImproving depth estimation through generalization enhancement in deep learning-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2024-
dc.identifier.mmsid991044861893903414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats