Patch-Based Separable Transformer for Visual Recognition

Sun, SY; Yue, XY; Zhao, HS; Torr, PHS; Bai, S

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TPAMI.2022.3231725
Scopus: eid_2-s2.0-85146254979
PMID: 37015401
WOS: WOS:001004665900085
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Patch-Based Separable Transformer for Visual Recognition

Title	Patch-Based Separable Transformer for Visual Recognition
Authors	Sun, SY Yue, XY Zhao, HS Torr, PHS Bai, S
Keywords	image classification instance segmentation object detection Transformer
Issue Date	1-Jul-2023
Publisher	Institute of Electrical and Electronics Engineers
Citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, v. 45, n. 7, p. 9241-9247 How to Cite? DOI: http://dx.doi.org/10.1109/TPAMI.2022.3231725
Abstract	The computational complexity of transformers limits it to be widely deployed onto frameworks for visual recognition. Recent work Dosovitskiy et al. 2021 significantly accelerates the network processing speed by reducing the resolution at the beginning of the network, however, it is still hard to be directly generalized onto other downstream tasks e.g.object detection and segmentation like CNN. In this paper, we present a transformer-based architecture retaining both the local and global interactions within the network, and can be transferable to other downstream tasks. The proposed architecture reforms the original full spatial self-attention into pixel-wise local attention and patch-wise global attention. Such factorization saves the computational cost while retaining the information of different granularities, which helps generate multi-scale features required by different tasks. By exploiting the factorized attention, we construct a Separable Transformer (SeT) for visual modeling. Experimental results show that SeT outperforms the previous state-of-the-art transformer-based approaches and its CNN counterparts on three major tasks including image classification, object detection and instance segmentation.(1)
Persistent Identifier	http://hdl.handle.net/10722/331713
ISSN	0162-8828 2023 Impact Factor: 20.8 2023 SCImago Journal Rankings: 6.158
ISI Accession Number ID	WOS:001004665900085

DC Field	Value	Language
dc.contributor.author	Sun, SY	-
dc.contributor.author	Yue, XY	-
dc.contributor.author	Zhao, HS	-
dc.contributor.author	Torr, PHS	-
dc.contributor.author	Bai, S	-
dc.date.accessioned	2023-09-21T06:58:14Z	-
dc.date.available	2023-09-21T06:58:14Z	-
dc.date.issued	2023-07-01	-
dc.identifier.citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, v. 45, n. 7, p. 9241-9247	-
dc.identifier.issn	0162-8828	-
dc.identifier.uri	http://hdl.handle.net/10722/331713	-
dc.description.abstract	The computational complexity of transformers limits it to be widely deployed onto frameworks for visual recognition. Recent work Dosovitskiy et al. 2021 significantly accelerates the network processing speed by reducing the resolution at the beginning of the network, however, it is still hard to be directly generalized onto other downstream tasks e.g.object detection and segmentation like CNN. In this paper, we present a transformer-based architecture retaining both the local and global interactions within the network, and can be transferable to other downstream tasks. The proposed architecture reforms the original full spatial self-attention into pixel-wise local attention and patch-wise global attention. Such factorization saves the computational cost while retaining the information of different granularities, which helps generate multi-scale features required by different tasks. By exploiting the factorized attention, we construct a Separable Transformer (SeT) for visual modeling. Experimental results show that SeT outperforms the previous state-of-the-art transformer-based approaches and its CNN counterparts on three major tasks including image classification, object detection and instance segmentation.(1)	-
dc.language	eng	-
dc.publisher	Institute of Electrical and Electronics Engineers	-
dc.relation.ispartof	IEEE Transactions on Pattern Analysis and Machine Intelligence	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject	image classification	-
dc.subject	instance segmentation	-
dc.subject	object detection	-
dc.subject	Transformer	-
dc.title	Patch-Based Separable Transformer for Visual Recognition	-
dc.type	Article	-
dc.identifier.doi	10.1109/TPAMI.2022.3231725	-
dc.identifier.pmid	37015401	-
dc.identifier.scopus	eid_2-s2.0-85146254979	-
dc.identifier.volume	45	-
dc.identifier.issue	7	-
dc.identifier.spage	9241	-
dc.identifier.epage	9247	-
dc.identifier.eissn	1939-3539	-
dc.identifier.isi	WOS:001004665900085	-
dc.publisher.place	LOS ALAMITOS	-
dc.identifier.issnl	0162-8828	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Patch-Based Separable Transformer for Visual Recognition

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats