File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1007/s44196-023-00292-9
- Scopus: eid_2-s2.0-85165391112
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: Integrating Vision Transformer-Based Bilinear Pooling and Attention Network Fusion of RGB and Skeleton Features for Human Action Recognition
Title | Integrating Vision Transformer-Based Bilinear Pooling and Attention Network Fusion of RGB and Skeleton Features for Human Action Recognition |
---|---|
Authors | |
Keywords | Feature fusion Human action recognition Multi-modal Self-attention |
Issue Date | 20-Jul-2023 |
Publisher | Atlantis Press |
Citation | International Journal of Computational Intelligence Systems, 2023, v. 16, n. 1, p. 1-11 How to Cite? |
Abstract | In this paper, we propose VT-BPAN, a novel approach that combines the capabilities of Vision Transformer (VT), bilinear pooling, and attention network fusion for effective human action recognition (HAR). The proposed methodology significantly enhances the accuracy of activity recognition through the following advancements: (1) The introduction of an effective two-stream feature pooling and fusion mechanism that combines RGB frames and skeleton data to augment the spatial–temporal feature representation. (2) The development of a spatial lightweight vision transformer that mitigates computational costs. The evaluation of this framework encompasses three widely employed video action datasets, demonstrating that the proposed approach achieves performance on par with state-of-the-art methods. |
Persistent Identifier | http://hdl.handle.net/10722/345484 |
ISSN | 2023 Impact Factor: 2.5 2023 SCImago Journal Rankings: 0.564 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Sun, Yaohui | - |
dc.contributor.author | Xu, Weiyao | - |
dc.contributor.author | Yu, Xiaoyi | - |
dc.contributor.author | Gao, Ju | - |
dc.contributor.author | Xia, Ting | - |
dc.date.accessioned | 2024-08-27T09:09:02Z | - |
dc.date.available | 2024-08-27T09:09:02Z | - |
dc.date.issued | 2023-07-20 | - |
dc.identifier.citation | International Journal of Computational Intelligence Systems, 2023, v. 16, n. 1, p. 1-11 | - |
dc.identifier.issn | 1875-6891 | - |
dc.identifier.uri | http://hdl.handle.net/10722/345484 | - |
dc.description.abstract | <p>In this paper, we propose VT-BPAN, a novel approach that combines the capabilities of Vision Transformer (VT), bilinear pooling, and attention network fusion for effective human action recognition (HAR). The proposed methodology significantly enhances the accuracy of activity recognition through the following advancements: (1) The introduction of an effective two-stream feature pooling and fusion mechanism that combines RGB frames and skeleton data to augment the spatial–temporal feature representation. (2) The development of a spatial lightweight vision transformer that mitigates computational costs. The evaluation of this framework encompasses three widely employed video action datasets, demonstrating that the proposed approach achieves performance on par with state-of-the-art methods.</p> | - |
dc.language | eng | - |
dc.publisher | Atlantis Press | - |
dc.relation.ispartof | International Journal of Computational Intelligence Systems | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject | Feature fusion | - |
dc.subject | Human action recognition | - |
dc.subject | Multi-modal | - |
dc.subject | Self-attention | - |
dc.title | Integrating Vision Transformer-Based Bilinear Pooling and Attention Network Fusion of RGB and Skeleton Features for Human Action Recognition | - |
dc.type | Article | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.1007/s44196-023-00292-9 | - |
dc.identifier.scopus | eid_2-s2.0-85165391112 | - |
dc.identifier.volume | 16 | - |
dc.identifier.issue | 1 | - |
dc.identifier.spage | 1 | - |
dc.identifier.epage | 11 | - |
dc.identifier.eissn | 1875-6883 | - |
dc.identifier.issnl | 1875-6883 | - |