File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Relationship-Embedded Representation Learning for Grounding Referring Expressions

TitleRelationship-Embedded Representation Learning for Grounding Referring Expressions
Authors
KeywordsReferring Expressions
Cross-Modal Relationship Extractor
Gated Graph Convolutional Network
Issue Date2020
PublisherIEEE. The Journal's web site is located at http://www.computer.org/tpami
Citation
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, Epub 2020-02-14 How to Cite?
AbstractGrounding referring expressions in images aims to locate the object instance in an image described by a referring expression. It involves a joint understanding of natural language and image content and is essential for a range of visual tasks related to human-computer interaction. As a language-to-vision matching task, the core of this problem is to not only extract all the necessary information in both the image and referring expressions, but also to to make full use of context information to achieve alignment of cross-modal semantic concepts in the extracted information. In this paper, we propose a Cross-Modal Relationship Extractor (CMRE) to adaptively highlight objects and relationships related to the given expression, with a cross-modal attention mechanism, and represent the extracted information as language-guided visual relation graphs. In addition, we propose a Gated Graph Convolutional Network (GGCN) to compute multimodal semantic context by fusing information from different modes and propagating multimodal information in the structured relation graphs. Experimental results on three common benchmark datasets show that our Cross-Modal Relationship Inference Network, which consists of CMRE and GGCN, greatly surpass all existing state-of-the-art methods.
Persistent Identifierhttp://hdl.handle.net/10722/289197
ISSN
2023 Impact Factor: 20.8
2023 SCImago Journal Rankings: 6.158
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorYANG, S-
dc.contributor.authorLI, G-
dc.contributor.authorYu, Y-
dc.date.accessioned2020-10-22T08:09:14Z-
dc.date.available2020-10-22T08:09:14Z-
dc.date.issued2020-
dc.identifier.citationIEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, Epub 2020-02-14-
dc.identifier.issn0162-8828-
dc.identifier.urihttp://hdl.handle.net/10722/289197-
dc.description.abstractGrounding referring expressions in images aims to locate the object instance in an image described by a referring expression. It involves a joint understanding of natural language and image content and is essential for a range of visual tasks related to human-computer interaction. As a language-to-vision matching task, the core of this problem is to not only extract all the necessary information in both the image and referring expressions, but also to to make full use of context information to achieve alignment of cross-modal semantic concepts in the extracted information. In this paper, we propose a Cross-Modal Relationship Extractor (CMRE) to adaptively highlight objects and relationships related to the given expression, with a cross-modal attention mechanism, and represent the extracted information as language-guided visual relation graphs. In addition, we propose a Gated Graph Convolutional Network (GGCN) to compute multimodal semantic context by fusing information from different modes and propagating multimodal information in the structured relation graphs. Experimental results on three common benchmark datasets show that our Cross-Modal Relationship Inference Network, which consists of CMRE and GGCN, greatly surpass all existing state-of-the-art methods.-
dc.languageeng-
dc.publisherIEEE. The Journal's web site is located at http://www.computer.org/tpami-
dc.relation.ispartofIEEE Transactions on Pattern Analysis and Machine Intelligence-
dc.rightsIEEE Transactions on Pattern Analysis and Machine Intelligence. Copyright © IEEE.-
dc.rights©20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.-
dc.subjectReferring Expressions-
dc.subjectCross-Modal Relationship Extractor-
dc.subjectGated Graph Convolutional Network-
dc.titleRelationship-Embedded Representation Learning for Grounding Referring Expressions-
dc.typeArticle-
dc.identifier.emailYu, Y: yzyu@cs.hku.hk-
dc.identifier.authorityYu, Y=rp01415-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/TPAMI.2020.2973983-
dc.identifier.pmid32078531-
dc.identifier.scopuseid_2-s2.0-85111789902-
dc.identifier.hkuros317121-
dc.identifier.volumeEpub 2020-02-14-
dc.identifier.isiWOS:000670578800018-
dc.publisher.placeUnited States-
dc.identifier.issnl0162-8828-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats