File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Contrastive Masked Image-Text Modeling for Medical Visual Representation Learning

TitleContrastive Masked Image-Text Modeling for Medical Visual Representation Learning
Authors
Keywordscontrastive learning
Image-text representation learning
masked autoencoding
Issue Date2023
Citation
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023, v. 14224 LNCS, p. 493-503 How to Cite?
AbstractSelf-supervised learning (SSL) of visual representations from paired medical images and text reports has recently shown great promise for various downstream tasks. However, previous work has focused on investigating the effectiveness of two major SSL techniques separately, i.e., contrastive learning and masked autoencoding, without exploring their potential synergies. In this paper, we aim to integrate the strengths of these two techniques by proposing a contrastive masked image-text modeling framework for medical visual representation learning. On one hand, our framework conducts cross-modal contrastive learning between masked medical images and text reports, with a representation decoder being incorporated to recover the misaligned information in the masked images. On the other hand, to further leverage masked autoencoding, a masked image is also required to be able to reconstruct the original image itself and the masked information in the text reports. With pre-training on a large-scale medical image and report dataset, our framework shows complementary benefits of integrating the two SSL techniques on four downstream classification datasets. Extensive evaluations demonstrate consistent improvements of our method over state-of-the-art approaches, especially when very scarce labeled data are available. code is available at https://github.com/cchen-cc/CMITM.
Persistent Identifierhttp://hdl.handle.net/10722/349975
ISSN
2023 SCImago Journal Rankings: 0.606

 

DC FieldValueLanguage
dc.contributor.authorChen, Cheng-
dc.contributor.authorZhong, Aoxiao-
dc.contributor.authorWu, Dufan-
dc.contributor.authorLuo, Jie-
dc.contributor.authorLi, Quanzheng-
dc.date.accessioned2024-10-17T07:02:14Z-
dc.date.available2024-10-17T07:02:14Z-
dc.date.issued2023-
dc.identifier.citationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023, v. 14224 LNCS, p. 493-503-
dc.identifier.issn0302-9743-
dc.identifier.urihttp://hdl.handle.net/10722/349975-
dc.description.abstractSelf-supervised learning (SSL) of visual representations from paired medical images and text reports has recently shown great promise for various downstream tasks. However, previous work has focused on investigating the effectiveness of two major SSL techniques separately, i.e., contrastive learning and masked autoencoding, without exploring their potential synergies. In this paper, we aim to integrate the strengths of these two techniques by proposing a contrastive masked image-text modeling framework for medical visual representation learning. On one hand, our framework conducts cross-modal contrastive learning between masked medical images and text reports, with a representation decoder being incorporated to recover the misaligned information in the masked images. On the other hand, to further leverage masked autoencoding, a masked image is also required to be able to reconstruct the original image itself and the masked information in the text reports. With pre-training on a large-scale medical image and report dataset, our framework shows complementary benefits of integrating the two SSL techniques on four downstream classification datasets. Extensive evaluations demonstrate consistent improvements of our method over state-of-the-art approaches, especially when very scarce labeled data are available. code is available at https://github.com/cchen-cc/CMITM.-
dc.languageeng-
dc.relation.ispartofLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)-
dc.subjectcontrastive learning-
dc.subjectImage-text representation learning-
dc.subjectmasked autoencoding-
dc.titleContrastive Masked Image-Text Modeling for Medical Visual Representation Learning-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1007/978-3-031-43904-9_48-
dc.identifier.scopuseid_2-s2.0-85174734600-
dc.identifier.volume14224 LNCS-
dc.identifier.spage493-
dc.identifier.epage503-
dc.identifier.eissn1611-3349-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats