Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering

Gong, H; Chen, G; Liu, S; Yu, Y; Li, G

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3460426.3463584
Scopus: eid_2-s2.0-85113541872
WOS: WOS:000723651900053

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering

Title	Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering
Authors	Gong, H Chen, G Liu, S Yu, Y Li, G
Keywords	Visual question answering transfer learning multi-task learning self-attention
Issue Date	2021
Publisher	Association for Computing Machinery.
Citation	Proceedings of the 2021 International Conference on Multimedia Retrieval (ICMR-21), Virtual Conference, Taipei, Taiwan, 16-19 November 2021, p. 456-460 How to Cite? DOI: http://dx.doi.org/10.1145/3460426.3463584
Abstract	Due to the severe lack of labeled data, existing methods of medical visual question answering usually rely on transfer learning to obtain effective image feature representation and use cross-modal fusion of visual and linguistic features to achieve question-related answer prediction. These two phases are performed independently and without considering the compatibility and applicability of the pretrained features for cross-modal fusion. Thus, we reformulate image feature pre-training as a multi-task learning paradigm and witness its extraordinary superiority, forcing it to take into account the applicability of features for the specific image comprehension task. Furthermore, we introduce a cross-modal self-attention (CMSA) module to selectively capture the long-range contextual relevance for more effective fusion of visual and linguistic features. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art methods. Our code and models are available at https://github.com/haifangong/CMSA-MTPT-4-MedicalVQA.
Description	The conference dates of ICMR 2021 was postponed from 21-24 August to 16-19 November 2021, due to the changing dynamics of the COVID-19 pandemic.
Persistent Identifier	http://hdl.handle.net/10722/301301
ISBN	9781450384636
ISI Accession Number ID	WOS:000723651900053

DC Field	Value	Language
dc.contributor.author	Gong, H	-
dc.contributor.author	Chen, G	-
dc.contributor.author	Liu, S	-
dc.contributor.author	Yu, Y	-
dc.contributor.author	Li, G	-
dc.date.accessioned	2021-07-27T08:09:06Z	-
dc.date.available	2021-07-27T08:09:06Z	-
dc.date.issued	2021	-
dc.identifier.citation	Proceedings of the 2021 International Conference on Multimedia Retrieval (ICMR-21), Virtual Conference, Taipei, Taiwan, 16-19 November 2021, p. 456-460	-
dc.identifier.isbn	9781450384636	-
dc.identifier.uri	http://hdl.handle.net/10722/301301	-
dc.description	The conference dates of ICMR 2021 was postponed from 21-24 August to 16-19 November 2021, due to the changing dynamics of the COVID-19 pandemic.	-
dc.description.abstract	Due to the severe lack of labeled data, existing methods of medical visual question answering usually rely on transfer learning to obtain effective image feature representation and use cross-modal fusion of visual and linguistic features to achieve question-related answer prediction. These two phases are performed independently and without considering the compatibility and applicability of the pretrained features for cross-modal fusion. Thus, we reformulate image feature pre-training as a multi-task learning paradigm and witness its extraordinary superiority, forcing it to take into account the applicability of features for the specific image comprehension task. Furthermore, we introduce a cross-modal self-attention (CMSA) module to selectively capture the long-range contextual relevance for more effective fusion of visual and linguistic features. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art methods. Our code and models are available at https://github.com/haifangong/CMSA-MTPT-4-MedicalVQA.	-
dc.language	eng	-
dc.publisher	Association for Computing Machinery.	-
dc.relation.ispartof	International Conference on Multimedia Retrieval (ICMR), 2021	-
dc.rights	International Conference on Multimedia Retrieval (ICMR), 2021. Copyright © Association for Computing Machinery.	-
dc.subject	Visual question answering	-
dc.subject	transfer learning	-
dc.subject	multi-task learning	-
dc.subject	self-attention	-
dc.title	Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering	-
dc.type	Conference_Paper	-
dc.identifier.email	Liu, S: sishuo@hku.hk	-
dc.identifier.email	Yu, Y: yzyu@cs.hku.hk	-
dc.identifier.authority	Yu, Y=rp01415	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1145/3460426.3463584	-
dc.identifier.scopus	eid_2-s2.0-85113541872	-
dc.identifier.hkuros	323546	-
dc.identifier.spage	456	-
dc.identifier.epage	460	-
dc.identifier.isi	WOS:000723651900053	-
dc.publisher.place	United States	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats