File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Hybrid Masked Image Modeling for 3D Medical Image Segmentation

TitleHybrid Masked Image Modeling for 3D Medical Image Segmentation
Authors
Keywords3D medical image segmentation
Masked image modeling
Self-supervised learning
Issue Date1-Apr-2024
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Journal of Biomedical and Health Informatics, 2024, v. 28, n. 4, p. 2115-2125 How to Cite?
AbstractMasked image modeling (MIM) with transformer backbones has recently been exploited as a powerful self-supervised pre-training technique. The existing MIM methods adopt the strategy to mask random patches of the image and reconstruct the missing pixels, which only considers semantic information at a lower level, and causes a long pre-training time. This paper presents HybridMIM, a novel hybrid self-supervised learning method based on masked image modeling for 3D medical image segmentation. Specifically, we design a two-level masking hierarchy to specify which and how patches in sub-volumes are masked, effectively providing the constraints of higher level semantic information. Then we learn the semantic information of medical images at three levels, including: 1) partial region prediction to reconstruct key contents of the 3D image, which largely reduces the pre-training time burden (pixel-level); 2) patch-masking perception to learn the spatial relationship between the patches in each sub-volume (region-level); and 3) drop-out-based contrastive learning between samples within a mini-batch, which further improves the generalization ability of the framework (sample-level). The proposed framework is versatile to support both CNN and transformer as encoder backbones, and also enables to pre-train decoders for image segmentation. We conduct comprehensive experiments on five widely-used public medical image segmentation datasets, including BraTS2020, BTCV, MSD Liver, MSD Spleen, and BraTS2023. The experimental results show the clear superiority of HybridMIM against competing supervised methods, masked pre-training approaches, and other self-supervised methods, in terms of quantitative metrics, speed performance and qualitative observations. The codes of HybridMIM are available at https://github.com/ge-xing/HybridMIM.
Persistent Identifierhttp://hdl.handle.net/10722/345591
ISSN
2023 Impact Factor: 6.7
2023 SCImago Journal Rankings: 1.964

 

DC FieldValueLanguage
dc.contributor.authorXing, Zhaohu-
dc.contributor.authorZhu, Lei-
dc.contributor.authorYu, Lequan-
dc.contributor.authorXing, Zhiheng-
dc.contributor.authorWan, Liang-
dc.date.accessioned2024-08-27T09:09:52Z-
dc.date.available2024-08-27T09:09:52Z-
dc.date.issued2024-04-01-
dc.identifier.citationIEEE Journal of Biomedical and Health Informatics, 2024, v. 28, n. 4, p. 2115-2125-
dc.identifier.issn2168-2194-
dc.identifier.urihttp://hdl.handle.net/10722/345591-
dc.description.abstractMasked image modeling (MIM) with transformer backbones has recently been exploited as a powerful self-supervised pre-training technique. The existing MIM methods adopt the strategy to mask random patches of the image and reconstruct the missing pixels, which only considers semantic information at a lower level, and causes a long pre-training time. This paper presents HybridMIM, a novel hybrid self-supervised learning method based on masked image modeling for 3D medical image segmentation. Specifically, we design a two-level masking hierarchy to specify which and how patches in sub-volumes are masked, effectively providing the constraints of higher level semantic information. Then we learn the semantic information of medical images at three levels, including: 1) partial region prediction to reconstruct key contents of the 3D image, which largely reduces the pre-training time burden (pixel-level); 2) patch-masking perception to learn the spatial relationship between the patches in each sub-volume (region-level); and 3) drop-out-based contrastive learning between samples within a mini-batch, which further improves the generalization ability of the framework (sample-level). The proposed framework is versatile to support both CNN and transformer as encoder backbones, and also enables to pre-train decoders for image segmentation. We conduct comprehensive experiments on five widely-used public medical image segmentation datasets, including BraTS2020, BTCV, MSD Liver, MSD Spleen, and BraTS2023. The experimental results show the clear superiority of HybridMIM against competing supervised methods, masked pre-training approaches, and other self-supervised methods, in terms of quantitative metrics, speed performance and qualitative observations. The codes of HybridMIM are available at https://github.com/ge-xing/HybridMIM.-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Journal of Biomedical and Health Informatics-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject3D medical image segmentation-
dc.subjectMasked image modeling-
dc.subjectSelf-supervised learning-
dc.titleHybrid Masked Image Modeling for 3D Medical Image Segmentation-
dc.typeArticle-
dc.identifier.doi10.1109/JBHI.2024.3360239-
dc.identifier.pmid38289846-
dc.identifier.scopuseid_2-s2.0-85184315889-
dc.identifier.volume28-
dc.identifier.issue4-
dc.identifier.spage2115-
dc.identifier.epage2125-
dc.identifier.eissn2168-2208-
dc.identifier.issnl2168-2194-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats