File Download

There are no files associated with this item.

Supplementary

Conference Paper: Propagating Over Phrase Relations for One-Stage Visual Grounding

TitlePropagating Over Phrase Relations for One-Stage Visual Grounding
Authors
KeywordsOne-Stage Phrase Grounding
Linguistic Graph
Relational Propagation
Visual Grounding
Issue Date2020
Citation
The 16th European Conference on Computer Vision (ECCV), Online, 23-28 August 2020 How to Cite?
AbstractPhrase level visual grounding aims to locate in an image the corresponding visual regions referred to by multiple noun phrases in a given sentence. Its challenge comes not only from large variations in visual contents and unrestricted phrase descriptions but also from unambiguous referrals derived from phrase relational reasoning. In this paper, we propose a linguistic structure guided propagation network for one-stage phrase grounding. It explicitly explores the linguistic structure of the sentence and performs relational propagation among noun phrases under the guidance of the linguistic relations between them. Specifically, we first construct a linguistic graph parsed from the sentence and then capture multimodal feature maps for all the phrasal nodes independently. The node features are then propagated over the edges with a tailor-designed relational propagation module and ultimately integrated for final prediction. Experiments on Flicker30K Entities dataset show that our model outperforms state-of-the-art methods and demonstrate the effectiveness of propagating among phrases with linguistic relations.
Persistent Identifierhttp://hdl.handle.net/10722/286647

 

DC FieldValueLanguage
dc.contributor.authorYANG, S-
dc.contributor.authorLI, G-
dc.contributor.authorYu, Y-
dc.date.accessioned2020-09-04T13:28:31Z-
dc.date.available2020-09-04T13:28:31Z-
dc.date.issued2020-
dc.identifier.citationThe 16th European Conference on Computer Vision (ECCV), Online, 23-28 August 2020-
dc.identifier.urihttp://hdl.handle.net/10722/286647-
dc.description.abstractPhrase level visual grounding aims to locate in an image the corresponding visual regions referred to by multiple noun phrases in a given sentence. Its challenge comes not only from large variations in visual contents and unrestricted phrase descriptions but also from unambiguous referrals derived from phrase relational reasoning. In this paper, we propose a linguistic structure guided propagation network for one-stage phrase grounding. It explicitly explores the linguistic structure of the sentence and performs relational propagation among noun phrases under the guidance of the linguistic relations between them. Specifically, we first construct a linguistic graph parsed from the sentence and then capture multimodal feature maps for all the phrasal nodes independently. The node features are then propagated over the edges with a tailor-designed relational propagation module and ultimately integrated for final prediction. Experiments on Flicker30K Entities dataset show that our model outperforms state-of-the-art methods and demonstrate the effectiveness of propagating among phrases with linguistic relations.-
dc.languageeng-
dc.relation.ispartofEuropean Conference on Computer Vision (ECCV)-
dc.subjectOne-Stage Phrase Grounding-
dc.subjectLinguistic Graph-
dc.subjectRelational Propagation-
dc.subjectVisual Grounding-
dc.titlePropagating Over Phrase Relations for One-Stage Visual Grounding-
dc.typeConference_Paper-
dc.identifier.emailYu, Y: yzyu@cs.hku.hk-
dc.identifier.authorityYu, Y=rp01415-
dc.identifier.hkuros313949-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats