File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/TPAMI.2020.2973983
- Scopus: eid_2-s2.0-85111789902
- PMID: 32078531
- WOS: WOS:000670578800018
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: Relationship-Embedded Representation Learning for Grounding Referring Expressions
Title | Relationship-Embedded Representation Learning for Grounding Referring Expressions |
---|---|
Authors | |
Keywords | Referring Expressions Cross-Modal Relationship Extractor Gated Graph Convolutional Network |
Issue Date | 2020 |
Publisher | IEEE. The Journal's web site is located at http://www.computer.org/tpami |
Citation | IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, Epub 2020-02-14 How to Cite? |
Abstract | Grounding referring expressions in images aims to locate the object instance in an image described by a referring expression. It involves a joint understanding of natural language and image content and is essential for a range of visual tasks related to human-computer interaction. As a language-to-vision matching task, the core of this problem is to not only extract all the necessary information in both the image and referring expressions, but also to to make full use of context information to achieve alignment of cross-modal semantic concepts in the extracted information. In this paper, we propose a Cross-Modal Relationship Extractor (CMRE) to adaptively highlight objects and relationships related to the given expression, with a cross-modal attention mechanism, and represent the extracted information as language-guided visual relation graphs. In addition, we propose a Gated Graph Convolutional Network (GGCN) to compute multimodal semantic context by fusing information from different modes and propagating multimodal information in the structured relation graphs. Experimental results on three common benchmark datasets show that our Cross-Modal Relationship Inference Network, which consists of CMRE and GGCN, greatly surpass all existing state-of-the-art methods. |
Persistent Identifier | http://hdl.handle.net/10722/289197 |
ISSN | 2023 Impact Factor: 20.8 2023 SCImago Journal Rankings: 6.158 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | YANG, S | - |
dc.contributor.author | LI, G | - |
dc.contributor.author | Yu, Y | - |
dc.date.accessioned | 2020-10-22T08:09:14Z | - |
dc.date.available | 2020-10-22T08:09:14Z | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, Epub 2020-02-14 | - |
dc.identifier.issn | 0162-8828 | - |
dc.identifier.uri | http://hdl.handle.net/10722/289197 | - |
dc.description.abstract | Grounding referring expressions in images aims to locate the object instance in an image described by a referring expression. It involves a joint understanding of natural language and image content and is essential for a range of visual tasks related to human-computer interaction. As a language-to-vision matching task, the core of this problem is to not only extract all the necessary information in both the image and referring expressions, but also to to make full use of context information to achieve alignment of cross-modal semantic concepts in the extracted information. In this paper, we propose a Cross-Modal Relationship Extractor (CMRE) to adaptively highlight objects and relationships related to the given expression, with a cross-modal attention mechanism, and represent the extracted information as language-guided visual relation graphs. In addition, we propose a Gated Graph Convolutional Network (GGCN) to compute multimodal semantic context by fusing information from different modes and propagating multimodal information in the structured relation graphs. Experimental results on three common benchmark datasets show that our Cross-Modal Relationship Inference Network, which consists of CMRE and GGCN, greatly surpass all existing state-of-the-art methods. | - |
dc.language | eng | - |
dc.publisher | IEEE. The Journal's web site is located at http://www.computer.org/tpami | - |
dc.relation.ispartof | IEEE Transactions on Pattern Analysis and Machine Intelligence | - |
dc.rights | IEEE Transactions on Pattern Analysis and Machine Intelligence. Copyright © IEEE. | - |
dc.rights | ©20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. | - |
dc.subject | Referring Expressions | - |
dc.subject | Cross-Modal Relationship Extractor | - |
dc.subject | Gated Graph Convolutional Network | - |
dc.title | Relationship-Embedded Representation Learning for Grounding Referring Expressions | - |
dc.type | Article | - |
dc.identifier.email | Yu, Y: yzyu@cs.hku.hk | - |
dc.identifier.authority | Yu, Y=rp01415 | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1109/TPAMI.2020.2973983 | - |
dc.identifier.pmid | 32078531 | - |
dc.identifier.scopus | eid_2-s2.0-85111789902 | - |
dc.identifier.hkuros | 317121 | - |
dc.identifier.volume | Epub 2020-02-14 | - |
dc.identifier.isi | WOS:000670578800018 | - |
dc.publisher.place | United States | - |
dc.identifier.issnl | 0162-8828 | - |