Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network

Han, Cong; Zhong, Yujie; Li, Dengjie; Han, Kai; Ma, Lin

File Download

There are no files associated with this item.

Supplementary

Citations:
Appears in Collections:
- Statistics & Actuarial Science: Conference papers

Conference Paper: Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network

Title	Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network
Authors	Han, Cong Zhong, Yujie Li, Dengjie Han, Kai Ma, Lin
Issue Date	2-Oct-2023
Abstract	Recently, the open-vocabulary semantic segmentation problem has attracted increasing attention and the best performing methods are based on two-stream networks: one stream for proposal mask generation and the other for segment classification using a pretrained visual-language model. However, existing two-stream methods require passing a great number of (up to a hundred) image crops into the visual-language model, which is highly inefficient. To address the problem, we propose a network that only needs a single pass through the visual-language model for each input image. Specifically, we first propose a novel network adaptation approach, termed patch severance, to restrict the harmful interference between the patch embeddings in the pre-trained visual encoder. We then propose classification anchor learning to encourage the network to spatially focus on more discriminative features for classification. Extensive experiments demonstrate that the proposed method achieves outstanding performance, surpassing state-of-the-art methods while being 4 to 7 times faster at inference.
Persistent Identifier	http://hdl.handle.net/10722/339384

DC Field	Value	Language
dc.contributor.author	Han, Cong	-
dc.contributor.author	Zhong, Yujie	-
dc.contributor.author	Li, Dengjie	-
dc.contributor.author	Han, Kai	-
dc.contributor.author	Ma, Lin	-
dc.date.accessioned	2024-03-11T10:36:10Z	-
dc.date.available	2024-03-11T10:36:10Z	-
dc.date.issued	2023-10-02	-
dc.identifier.uri	http://hdl.handle.net/10722/339384	-
dc.description.abstract	<p>Recently, the open-vocabulary semantic segmentation problem has attracted increasing attention and the best performing methods are based on two-stream networks: one stream for proposal mask generation and the other for segment classification using a pretrained visual-language model. However, existing two-stream methods require passing a great number of (up to a hundred) image crops into the visual-language model, which is highly inefficient. To address the problem, we propose a network that only needs a single pass through the visual-language model for each input image. Specifically, we first propose a novel network adaptation approach, termed patch severance, to restrict the harmful interference between the patch embeddings in the pre-trained visual encoder. We then propose classification anchor learning to encourage the network to spatially focus on more discriminative features for classification. Extensive experiments demonstrate that the proposed method achieves outstanding performance, surpassing state-of-the-art methods while being 4 to 7 times faster at inference.</p>	-
dc.language	eng	-
dc.relation.ispartof	2023 International Conference on Computer Vision (ICCV) (02/10/2023-06/10/2023, , , Paris)	-
dc.title	Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network	-
dc.type	Conference_Paper	-

File Download

Supplementary

Conference Paper: Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats