File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)

Article: TCFormer: Visual Recognition via Token Clustering Transformer

TitleTCFormer: Visual Recognition via Token Clustering Transformer
Authors
Keywordsdynamic token
human pose estimation
Image classification
image classification
object detection
Object detection
semantic segmentation
Semantic segmentation
Semantics
Shape
Task analysis
Transformers
Vision transformer
Issue Date1-Jan-2024
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, p. 1-16 How to Cite?
AbstractTransformers are widely used in computer vision areas and have achieved remarkable success. Most state-of-the-art approaches split images into regular grids and represent each grid region with a vision token. However, fixed token distribution disregards the semantic meaning of different image regions, resulting in sub-optimal performance. To address this issue, we propose the Token Clustering Transformer (TCFormer), which generates dynamic vision tokens based on semantic meaning. Our dynamic tokens possess two crucial characteristics: (1) Representing image regions with similar semantic meanings using the same vision token, even if those regions are not adjacent, and (2) concentrating on regions with valuable details and represent them using fine tokens. Through extensive experimentation across various applications, including image classification, human pose estimation, semantic segmentation, and object detection, we demonstrate the effectiveness of our TCFormer. The code and models for this work are available at https://github.com/zengwang430521/TCFormer.
Persistent Identifierhttp://hdl.handle.net/10722/348565
ISSN
2023 Impact Factor: 20.8
2023 SCImago Journal Rankings: 6.158

 

DC FieldValueLanguage
dc.contributor.authorZeng, Wang-
dc.contributor.authorJin, Sheng-
dc.contributor.authorXu, Lumin-
dc.contributor.authorLiu, Wentao-
dc.contributor.authorQian, Chen-
dc.contributor.authorOuyang, Wanli-
dc.contributor.authorLuo, Ping-
dc.contributor.authorWang, Xiaogang-
dc.date.accessioned2024-10-10T00:31:37Z-
dc.date.available2024-10-10T00:31:37Z-
dc.date.issued2024-01-01-
dc.identifier.citationIEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, p. 1-16-
dc.identifier.issn0162-8828-
dc.identifier.urihttp://hdl.handle.net/10722/348565-
dc.description.abstractTransformers are widely used in computer vision areas and have achieved remarkable success. Most state-of-the-art approaches split images into regular grids and represent each grid region with a vision token. However, fixed token distribution disregards the semantic meaning of different image regions, resulting in sub-optimal performance. To address this issue, we propose the Token Clustering Transformer (TCFormer), which generates dynamic vision tokens based on semantic meaning. Our dynamic tokens possess two crucial characteristics: (1) Representing image regions with similar semantic meanings using the same vision token, even if those regions are not adjacent, and (2) concentrating on regions with valuable details and represent them using fine tokens. Through extensive experimentation across various applications, including image classification, human pose estimation, semantic segmentation, and object detection, we demonstrate the effectiveness of our TCFormer. The code and models for this work are available at https://github.com/zengwang430521/TCFormer.-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Pattern Analysis and Machine Intelligence-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectdynamic token-
dc.subjecthuman pose estimation-
dc.subjectImage classification-
dc.subjectimage classification-
dc.subjectobject detection-
dc.subjectObject detection-
dc.subjectsemantic segmentation-
dc.subjectSemantic segmentation-
dc.subjectSemantics-
dc.subjectShape-
dc.subjectTask analysis-
dc.subjectTransformers-
dc.subjectVision transformer-
dc.titleTCFormer: Visual Recognition via Token Clustering Transformer-
dc.typeArticle-
dc.identifier.doi10.1109/TPAMI.2024.3425768-
dc.identifier.scopuseid_2-s2.0-85198376786-
dc.identifier.spage1-
dc.identifier.epage16-
dc.identifier.eissn1939-3539-
dc.identifier.issnl0162-8828-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats