File Download
Supplementary

postgraduate thesis: Deep learning based medical image segmentation and visual question answering

TitleDeep learning based medical image segmentation and visual question answering
Authors
Issue Date2023
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Liu, S. [劉思鑠]. (2023). Deep learning based medical image segmentation and visual question answering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThis work proposes methods for 3D/2D medical image segmentation and medical visual question answering. For 2D segmentation, we develop a self-learning correction paradigm for semisupervised biomedical image segmentation. Our coarse-to-fine strategy adopts lesion inpainting as a self-supervised pretext task for unlabeled data, which enhances network representations for improved segmentation. This approach leverages unlabeled data to derive additional supervision signals that guide the network to learn better feature representations. For 3D segmentation, we develop 3D UNeXt, a hybrid CNN-MLP network based on U-Net. 3D UNeXt balances complexity and performance by combining convolutional local feature extraction with MLP blocks for global context propagation. The convolutional layers capture low-level local features while the MLP blocks model long-range dependencies, enabling more effective 3D segmentation. For medical visual question answering, we introduce a cross-modal self-attention module to selectively capture long-range visual-linguistic contextual relevance for effective feature fusion. More importantly, we reformulate image feature pre-training as a multi-task learning paradigm to make features more applicable for multimodal fusion and question answering. This multi-task pre-training enables image features to encode richer contextual clues that are beneficial for visual question answering. In summary, we propose 2D/3D segmentation methods and multi-task pre-training with cross-modal self-attention for medical visual question answering. Our techniques achieve superior performance while balancing model complexity, demonstrating their potential to improve segmentation, 3D analysis and multimodal learning for medical imaging applications. The proposed frameworks represent steps towards more comprehensive computational models for medical image analysis.
DegreeDoctor of Philosophy
SubjectDeep learning (Machine learning)
Information visualization
Natural language processing (Computer science)
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/335164

 

DC FieldValueLanguage
dc.contributor.authorLiu, Sishuo-
dc.contributor.author劉思鑠-
dc.date.accessioned2023-11-13T07:45:06Z-
dc.date.available2023-11-13T07:45:06Z-
dc.date.issued2023-
dc.identifier.citationLiu, S. [劉思鑠]. (2023). Deep learning based medical image segmentation and visual question answering. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/335164-
dc.description.abstractThis work proposes methods for 3D/2D medical image segmentation and medical visual question answering. For 2D segmentation, we develop a self-learning correction paradigm for semisupervised biomedical image segmentation. Our coarse-to-fine strategy adopts lesion inpainting as a self-supervised pretext task for unlabeled data, which enhances network representations for improved segmentation. This approach leverages unlabeled data to derive additional supervision signals that guide the network to learn better feature representations. For 3D segmentation, we develop 3D UNeXt, a hybrid CNN-MLP network based on U-Net. 3D UNeXt balances complexity and performance by combining convolutional local feature extraction with MLP blocks for global context propagation. The convolutional layers capture low-level local features while the MLP blocks model long-range dependencies, enabling more effective 3D segmentation. For medical visual question answering, we introduce a cross-modal self-attention module to selectively capture long-range visual-linguistic contextual relevance for effective feature fusion. More importantly, we reformulate image feature pre-training as a multi-task learning paradigm to make features more applicable for multimodal fusion and question answering. This multi-task pre-training enables image features to encode richer contextual clues that are beneficial for visual question answering. In summary, we propose 2D/3D segmentation methods and multi-task pre-training with cross-modal self-attention for medical visual question answering. Our techniques achieve superior performance while balancing model complexity, demonstrating their potential to improve segmentation, 3D analysis and multimodal learning for medical imaging applications. The proposed frameworks represent steps towards more comprehensive computational models for medical image analysis.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshDeep learning (Machine learning)-
dc.subject.lcshInformation visualization-
dc.subject.lcshNatural language processing (Computer science)-
dc.titleDeep learning based medical image segmentation and visual question answering-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2023-
dc.identifier.mmsid991044736500003414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats