File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/TPAMI.2022.3231725
- Scopus: eid_2-s2.0-85146254979
- PMID: 37015401
- WOS: WOS:001004665900085
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: Patch-Based Separable Transformer for Visual Recognition
Title | Patch-Based Separable Transformer for Visual Recognition |
---|---|
Authors | |
Keywords | image classification instance segmentation object detection Transformer |
Issue Date | 1-Jul-2023 |
Publisher | Institute of Electrical and Electronics Engineers |
Citation | IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, v. 45, n. 7, p. 9241-9247 How to Cite? |
Abstract | The computational complexity of transformers limits it to be widely deployed onto frameworks for visual recognition. Recent work Dosovitskiy et al. 2021 significantly accelerates the network processing speed by reducing the resolution at the beginning of the network, however, it is still hard to be directly generalized onto other downstream tasks e.g.object detection and segmentation like CNN. In this paper, we present a transformer-based architecture retaining both the local and global interactions within the network, and can be transferable to other downstream tasks. The proposed architecture reforms the original full spatial self-attention into pixel-wise local attention and patch-wise global attention. Such factorization saves the computational cost while retaining the information of different granularities, which helps generate multi-scale features required by different tasks. By exploiting the factorized attention, we construct a Separable Transformer (SeT) for visual modeling. Experimental results show that SeT outperforms the previous state-of-the-art transformer-based approaches and its CNN counterparts on three major tasks including image classification, object detection and instance segmentation.(1) |
Persistent Identifier | http://hdl.handle.net/10722/331713 |
ISSN | 2023 Impact Factor: 20.8 2023 SCImago Journal Rankings: 6.158 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Sun, SY | - |
dc.contributor.author | Yue, XY | - |
dc.contributor.author | Zhao, HS | - |
dc.contributor.author | Torr, PHS | - |
dc.contributor.author | Bai, S | - |
dc.date.accessioned | 2023-09-21T06:58:14Z | - |
dc.date.available | 2023-09-21T06:58:14Z | - |
dc.date.issued | 2023-07-01 | - |
dc.identifier.citation | IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, v. 45, n. 7, p. 9241-9247 | - |
dc.identifier.issn | 0162-8828 | - |
dc.identifier.uri | http://hdl.handle.net/10722/331713 | - |
dc.description.abstract | The computational complexity of transformers limits it to be widely deployed onto frameworks for visual recognition. Recent work Dosovitskiy et al. 2021 significantly accelerates the network processing speed by reducing the resolution at the beginning of the network, however, it is still hard to be directly generalized onto other downstream tasks e.g.object detection and segmentation like CNN. In this paper, we present a transformer-based architecture retaining both the local and global interactions within the network, and can be transferable to other downstream tasks. The proposed architecture reforms the original full spatial self-attention into pixel-wise local attention and patch-wise global attention. Such factorization saves the computational cost while retaining the information of different granularities, which helps generate multi-scale features required by different tasks. By exploiting the factorized attention, we construct a Separable Transformer (SeT) for visual modeling. Experimental results show that SeT outperforms the previous state-of-the-art transformer-based approaches and its CNN counterparts on three major tasks including image classification, object detection and instance segmentation.(1) | - |
dc.language | eng | - |
dc.publisher | Institute of Electrical and Electronics Engineers | - |
dc.relation.ispartof | IEEE Transactions on Pattern Analysis and Machine Intelligence | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject | image classification | - |
dc.subject | instance segmentation | - |
dc.subject | object detection | - |
dc.subject | Transformer | - |
dc.title | Patch-Based Separable Transformer for Visual Recognition | - |
dc.type | Article | - |
dc.identifier.doi | 10.1109/TPAMI.2022.3231725 | - |
dc.identifier.pmid | 37015401 | - |
dc.identifier.scopus | eid_2-s2.0-85146254979 | - |
dc.identifier.volume | 45 | - |
dc.identifier.issue | 7 | - |
dc.identifier.spage | 9241 | - |
dc.identifier.epage | 9247 | - |
dc.identifier.eissn | 1939-3539 | - |
dc.identifier.isi | WOS:001004665900085 | - |
dc.publisher.place | LOS ALAMITOS | - |
dc.identifier.issnl | 0162-8828 | - |