File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1007/978-3-031-43904-9_48
- Scopus: eid_2-s2.0-85174734600
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Contrastive Masked Image-Text Modeling for Medical Visual Representation Learning
Title | Contrastive Masked Image-Text Modeling for Medical Visual Representation Learning |
---|---|
Authors | |
Keywords | contrastive learning Image-text representation learning masked autoencoding |
Issue Date | 2023 |
Citation | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023, v. 14224 LNCS, p. 493-503 How to Cite? |
Abstract | Self-supervised learning (SSL) of visual representations from paired medical images and text reports has recently shown great promise for various downstream tasks. However, previous work has focused on investigating the effectiveness of two major SSL techniques separately, i.e., contrastive learning and masked autoencoding, without exploring their potential synergies. In this paper, we aim to integrate the strengths of these two techniques by proposing a contrastive masked image-text modeling framework for medical visual representation learning. On one hand, our framework conducts cross-modal contrastive learning between masked medical images and text reports, with a representation decoder being incorporated to recover the misaligned information in the masked images. On the other hand, to further leverage masked autoencoding, a masked image is also required to be able to reconstruct the original image itself and the masked information in the text reports. With pre-training on a large-scale medical image and report dataset, our framework shows complementary benefits of integrating the two SSL techniques on four downstream classification datasets. Extensive evaluations demonstrate consistent improvements of our method over state-of-the-art approaches, especially when very scarce labeled data are available. code is available at https://github.com/cchen-cc/CMITM. |
Persistent Identifier | http://hdl.handle.net/10722/349975 |
ISSN | 2023 SCImago Journal Rankings: 0.606 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Chen, Cheng | - |
dc.contributor.author | Zhong, Aoxiao | - |
dc.contributor.author | Wu, Dufan | - |
dc.contributor.author | Luo, Jie | - |
dc.contributor.author | Li, Quanzheng | - |
dc.date.accessioned | 2024-10-17T07:02:14Z | - |
dc.date.available | 2024-10-17T07:02:14Z | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023, v. 14224 LNCS, p. 493-503 | - |
dc.identifier.issn | 0302-9743 | - |
dc.identifier.uri | http://hdl.handle.net/10722/349975 | - |
dc.description.abstract | Self-supervised learning (SSL) of visual representations from paired medical images and text reports has recently shown great promise for various downstream tasks. However, previous work has focused on investigating the effectiveness of two major SSL techniques separately, i.e., contrastive learning and masked autoencoding, without exploring their potential synergies. In this paper, we aim to integrate the strengths of these two techniques by proposing a contrastive masked image-text modeling framework for medical visual representation learning. On one hand, our framework conducts cross-modal contrastive learning between masked medical images and text reports, with a representation decoder being incorporated to recover the misaligned information in the masked images. On the other hand, to further leverage masked autoencoding, a masked image is also required to be able to reconstruct the original image itself and the masked information in the text reports. With pre-training on a large-scale medical image and report dataset, our framework shows complementary benefits of integrating the two SSL techniques on four downstream classification datasets. Extensive evaluations demonstrate consistent improvements of our method over state-of-the-art approaches, especially when very scarce labeled data are available. code is available at https://github.com/cchen-cc/CMITM. | - |
dc.language | eng | - |
dc.relation.ispartof | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | - |
dc.subject | contrastive learning | - |
dc.subject | Image-text representation learning | - |
dc.subject | masked autoencoding | - |
dc.title | Contrastive Masked Image-Text Modeling for Medical Visual Representation Learning | - |
dc.type | Conference_Paper | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1007/978-3-031-43904-9_48 | - |
dc.identifier.scopus | eid_2-s2.0-85174734600 | - |
dc.identifier.volume | 14224 LNCS | - |
dc.identifier.spage | 493 | - |
dc.identifier.epage | 503 | - |
dc.identifier.eissn | 1611-3349 | - |