File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering

TitleCross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering
Authors
KeywordsVisual question answering
transfer learning
multi-task learning
self-attention
Issue Date2021
PublisherAssociation for Computing Machinery.
Citation
Proceedings of the 2021 International Conference on Multimedia Retrieval (ICMR-21), Virtual Conference, Taipei, Taiwan, 16-19 November 2021, p. 456-460 How to Cite?
AbstractDue to the severe lack of labeled data, existing methods of medical visual question answering usually rely on transfer learning to obtain effective image feature representation and use cross-modal fusion of visual and linguistic features to achieve question-related answer prediction. These two phases are performed independently and without considering the compatibility and applicability of the pretrained features for cross-modal fusion. Thus, we reformulate image feature pre-training as a multi-task learning paradigm and witness its extraordinary superiority, forcing it to take into account the applicability of features for the specific image comprehension task. Furthermore, we introduce a cross-modal self-attention (CMSA) module to selectively capture the long-range contextual relevance for more effective fusion of visual and linguistic features. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art methods. Our code and models are available at https://github.com/haifangong/CMSA-MTPT-4-MedicalVQA.
DescriptionThe conference dates of ICMR 2021 was postponed from 21-24 August to 16-19 November 2021, due to the changing dynamics of the COVID-19 pandemic.
Persistent Identifierhttp://hdl.handle.net/10722/301301
ISBN
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorGong, H-
dc.contributor.authorChen, G-
dc.contributor.authorLiu, S-
dc.contributor.authorYu, Y-
dc.contributor.authorLi, G-
dc.date.accessioned2021-07-27T08:09:06Z-
dc.date.available2021-07-27T08:09:06Z-
dc.date.issued2021-
dc.identifier.citationProceedings of the 2021 International Conference on Multimedia Retrieval (ICMR-21), Virtual Conference, Taipei, Taiwan, 16-19 November 2021, p. 456-460-
dc.identifier.isbn9781450384636-
dc.identifier.urihttp://hdl.handle.net/10722/301301-
dc.descriptionThe conference dates of ICMR 2021 was postponed from 21-24 August to 16-19 November 2021, due to the changing dynamics of the COVID-19 pandemic.-
dc.description.abstractDue to the severe lack of labeled data, existing methods of medical visual question answering usually rely on transfer learning to obtain effective image feature representation and use cross-modal fusion of visual and linguistic features to achieve question-related answer prediction. These two phases are performed independently and without considering the compatibility and applicability of the pretrained features for cross-modal fusion. Thus, we reformulate image feature pre-training as a multi-task learning paradigm and witness its extraordinary superiority, forcing it to take into account the applicability of features for the specific image comprehension task. Furthermore, we introduce a cross-modal self-attention (CMSA) module to selectively capture the long-range contextual relevance for more effective fusion of visual and linguistic features. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art methods. Our code and models are available at https://github.com/haifangong/CMSA-MTPT-4-MedicalVQA.-
dc.languageeng-
dc.publisherAssociation for Computing Machinery.-
dc.relation.ispartofInternational Conference on Multimedia Retrieval (ICMR), 2021-
dc.rightsInternational Conference on Multimedia Retrieval (ICMR), 2021. Copyright © Association for Computing Machinery.-
dc.subjectVisual question answering-
dc.subjecttransfer learning-
dc.subjectmulti-task learning-
dc.subjectself-attention-
dc.titleCross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering-
dc.typeConference_Paper-
dc.identifier.emailLiu, S: sishuo@hku.hk-
dc.identifier.emailYu, Y: yzyu@cs.hku.hk-
dc.identifier.authorityYu, Y=rp01415-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1145/3460426.3463584-
dc.identifier.scopuseid_2-s2.0-85113541872-
dc.identifier.hkuros323546-
dc.identifier.spage456-
dc.identifier.epage460-
dc.identifier.isiWOS:000723651900053-
dc.publisher.placeUnited States-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats