Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation

Yang, Zhao; Wang, Jiaqi; Tang, Yansong; Chen, Kai; Zhao, Hengshuang; Torr, Philip H.S.

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-85167968995

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation

Title	Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
Authors	Yang, Zhao Wang, Jiaqi Tang, Yansong Chen, Kai Zhao, Hengshuang Torr, Philip H.S.
Issue Date	2023
Citation	Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023, 2023, v. 37, p. 3222-3230 How to Cite?
Abstract	Referring image segmentation segments an image from a language expression. With the aim of producing high-quality masks, existing methods often adopt iterative learning approaches that rely on RNNs or stacked attention layers to refine vision-language features. Despite their complexity, RNN-based methods are subject to specific encoder choices, while attention-based methods offer limited gains. In this work, we introduce a simple yet effective alternative for progressively learning discriminative multi-modal features. The core idea of our approach is to leverage a continuously updated query as the representation of the target object and at each iteration, strengthen multi-modal features strongly correlated to the query while weakening less related ones. As the query is initialized by language features and successively updated by object features, our algorithm gradually shifts from being localization-centric to segmentation-centric. This strategy enables the incremental recovery of missing object parts and/or removal of extraneous parts through iteration. Compared to its counterparts, our method is more versatile - it can be plugged into prior arts straightforwardly and consistently bring improvements. Experimental results on the challenging datasets of RefCOCO, RefCOCO+, and G-Ref demonstrate its advantage with respect to the state-of-the-art methods.
Persistent Identifier	http://hdl.handle.net/10722/333641

DC Field	Value	Language
dc.contributor.author	Yang, Zhao	-
dc.contributor.author	Wang, Jiaqi	-
dc.contributor.author	Tang, Yansong	-
dc.contributor.author	Chen, Kai	-
dc.contributor.author	Zhao, Hengshuang	-
dc.contributor.author	Torr, Philip H.S.	-
dc.date.accessioned	2023-10-06T05:21:14Z	-
dc.date.available	2023-10-06T05:21:14Z	-
dc.date.issued	2023	-
dc.identifier.citation	Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023, 2023, v. 37, p. 3222-3230	-
dc.identifier.uri	http://hdl.handle.net/10722/333641	-
dc.description.abstract	Referring image segmentation segments an image from a language expression. With the aim of producing high-quality masks, existing methods often adopt iterative learning approaches that rely on RNNs or stacked attention layers to refine vision-language features. Despite their complexity, RNN-based methods are subject to specific encoder choices, while attention-based methods offer limited gains. In this work, we introduce a simple yet effective alternative for progressively learning discriminative multi-modal features. The core idea of our approach is to leverage a continuously updated query as the representation of the target object and at each iteration, strengthen multi-modal features strongly correlated to the query while weakening less related ones. As the query is initialized by language features and successively updated by object features, our algorithm gradually shifts from being localization-centric to segmentation-centric. This strategy enables the incremental recovery of missing object parts and/or removal of extraneous parts through iteration. Compared to its counterparts, our method is more versatile - it can be plugged into prior arts straightforwardly and consistently bring improvements. Experimental results on the challenging datasets of RefCOCO, RefCOCO+, and G-Ref demonstrate its advantage with respect to the state-of-the-art methods.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023	-
dc.title	Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.scopus	eid_2-s2.0-85167968995	-
dc.identifier.volume	37	-
dc.identifier.spage	3222	-
dc.identifier.epage	3230	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats