File Download

There are no files associated with this item.

Supplementary

Conference Paper: Not all tokens are equal: Human-centric visual analysis via token clustering transformer

TitleNot all tokens are equal: Human-centric visual analysis via token clustering transformer
Authors
Issue Date2022
PublisherIEEE Computer Society.
Citation
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Virtual), New Orleans, Louisiana, USA, 19-24 June, 2022. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, p. 11101-11111 How to Cite?
AbstractVision transformers have achieved great successes in many computer vision tasks. Most methods generate vision tokens by splitting an image into a regular and fixed grid and treating each cell as a token. However, not all regions are equally important in human-centric vision tasks, e.g., the human body needs a fine representation with many tokens, while the image background can be modeled by a few tokens. To address this problem, we propose a novel Vision Transformer, called Token Clustering Transformer (TCFormer), which merges tokens by progressive clustering, where the tokens can be merged from different locations with flexible shapes and sizes. The tokens in TCFormer can not only focus on important areas but also adjust the token shapes to fit the semantic concept and adopt a fine resolution for regions containing critical details, which is beneficial to capturing detailed information. Extensive experiments show that TCFormer consistently outperforms its counterparts on different challenging humancentric tasks and datasets, including whole-body pose estimation on COCO-WholeBody and 3D human mesh reconstruction on 3DPW.
DescriptionOral
Persistent Identifierhttp://hdl.handle.net/10722/315678

 

DC FieldValueLanguage
dc.contributor.authorZeng, W-
dc.contributor.authorJin, S-
dc.contributor.authorLiu, W-
dc.contributor.authorQian, C-
dc.contributor.authorLuo, P-
dc.contributor.authorOuyang, W-
dc.contributor.authorWang, X-
dc.date.accessioned2022-08-19T09:02:24Z-
dc.date.available2022-08-19T09:02:24Z-
dc.date.issued2022-
dc.identifier.citationIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Virtual), New Orleans, Louisiana, USA, 19-24 June, 2022. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, p. 11101-11111-
dc.identifier.urihttp://hdl.handle.net/10722/315678-
dc.descriptionOral-
dc.description.abstractVision transformers have achieved great successes in many computer vision tasks. Most methods generate vision tokens by splitting an image into a regular and fixed grid and treating each cell as a token. However, not all regions are equally important in human-centric vision tasks, e.g., the human body needs a fine representation with many tokens, while the image background can be modeled by a few tokens. To address this problem, we propose a novel Vision Transformer, called Token Clustering Transformer (TCFormer), which merges tokens by progressive clustering, where the tokens can be merged from different locations with flexible shapes and sizes. The tokens in TCFormer can not only focus on important areas but also adjust the token shapes to fit the semantic concept and adopt a fine resolution for regions containing critical details, which is beneficial to capturing detailed information. Extensive experiments show that TCFormer consistently outperforms its counterparts on different challenging humancentric tasks and datasets, including whole-body pose estimation on COCO-WholeBody and 3D human mesh reconstruction on 3DPW.-
dc.languageeng-
dc.publisherIEEE Computer Society.-
dc.relation.ispartofProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022-
dc.rightsProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. Copyright © IEEE Computer Society.-
dc.titleNot all tokens are equal: Human-centric visual analysis via token clustering transformer-
dc.typeConference_Paper-
dc.identifier.emailLuo, P: pluo@hku.hk-
dc.identifier.authorityLuo, P=rp02575-
dc.identifier.hkuros335587-
dc.identifier.spage11101-
dc.identifier.epage11111-
dc.publisher.placeUnited States-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats