Structured Attentions for Visual Question Answering

Zhu, Chen; Zhao, Yanpeng; Huang, Shuaiyi; Tu, Kewei; Ma, Yi

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/ICCV.2017.145
Scopus: eid_2-s2.0-85041915476
WOS: WOS:000425498401038
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- HKU Musketeers Foundation Institute of Data Science: Conference papers

Conference Paper: Structured Attentions for Visual Question Answering

Title	Structured Attentions for Visual Question Answering
Authors	Zhu, Chen Zhao, Yanpeng Huang, Shuaiyi Tu, Kewei Ma, Yi
Issue Date	2017
Citation	Proceedings of the IEEE International Conference on Computer Vision, 2017, v. 2017-October, p. 1300-1309 How to Cite? DOI: http://dx.doi.org/10.1109/ICCV.2017.145
Abstract	Visual attention, which assigns weights to image regions according to their relevance to a question, is considered as an indispensable part by most Visual Question Answering models. Although the questions may involve complex rela- tions among multiple regions, few attention models can ef- fectively encode such cross-region relations. In this paper, we demonstrate the importance of encoding such relations by showing the limited effective receptive field of ResNet on two datasets, and propose to model the visual attention as a multivariate distribution over a grid-structured Con- ditional Random Field on image regions. We demonstrate how to convert the iterative inference algorithms, Mean Field and Loopy Belief Propagation, as recurrent layers of an end-to-end neural network. We empirically evalu- ated our model on 3 datasets, in which it surpasses the best baseline model of the newly released CLEVR dataset [13] by 9.5%, and the best published model on the VQA dataset [3] by 1.25%. Source code is available at https://github.com/zhuchen03/vqa-sva.
Persistent Identifier	http://hdl.handle.net/10722/327173
ISSN	1550-5499 2023 SCImago Journal Rankings: 12.263
ISI Accession Number ID	WOS:000425498401038

DC Field	Value	Language
dc.contributor.author	Zhu, Chen	-
dc.contributor.author	Zhao, Yanpeng	-
dc.contributor.author	Huang, Shuaiyi	-
dc.contributor.author	Tu, Kewei	-
dc.contributor.author	Ma, Yi	-
dc.date.accessioned	2023-03-31T05:29:29Z	-
dc.date.available	2023-03-31T05:29:29Z	-
dc.date.issued	2017	-
dc.identifier.citation	Proceedings of the IEEE International Conference on Computer Vision, 2017, v. 2017-October, p. 1300-1309	-
dc.identifier.issn	1550-5499	-
dc.identifier.uri	http://hdl.handle.net/10722/327173	-
dc.description.abstract	Visual attention, which assigns weights to image regions according to their relevance to a question, is considered as an indispensable part by most Visual Question Answering models. Although the questions may involve complex rela- tions among multiple regions, few attention models can ef- fectively encode such cross-region relations. In this paper, we demonstrate the importance of encoding such relations by showing the limited effective receptive field of ResNet on two datasets, and propose to model the visual attention as a multivariate distribution over a grid-structured Con- ditional Random Field on image regions. We demonstrate how to convert the iterative inference algorithms, Mean Field and Loopy Belief Propagation, as recurrent layers of an end-to-end neural network. We empirically evalu- ated our model on 3 datasets, in which it surpasses the best baseline model of the newly released CLEVR dataset [13] by 9.5%, and the best published model on the VQA dataset [3] by 1.25%. Source code is available at https://github.com/zhuchen03/vqa-sva.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the IEEE International Conference on Computer Vision	-
dc.title	Structured Attentions for Visual Question Answering	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/ICCV.2017.145	-
dc.identifier.scopus	eid_2-s2.0-85041915476	-
dc.identifier.volume	2017-October	-
dc.identifier.spage	1300	-
dc.identifier.epage	1309	-
dc.identifier.isi	WOS:000425498401038	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Structured Attentions for Visual Question Answering

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats