Ar3d: Attention residual 3D network for human action recognition

Dong, Min; Fang, Zhenglin; Li, Yongfa; Bi, Sheng; Chen, Jiangcheng

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.3390/s21051656
Scopus: eid_2-s2.0-85101682101
PMID: 33670835
WOS: WOS:000628552700001
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Industrial & Manufacturing Systems Engineering: Journal/Magazine Articles

Article: Ar3d: Attention residual 3D network for human action recognition

Title	Ar3d: Attention residual 3D network for human action recognition
Authors	Dong, Min Fang, Zhenglin Li, Yongfa Bi, Sheng Chen, Jiangcheng
Keywords	3D Action recognition Attention mechanism Convolutional neural network Residual
Issue Date	2021
Citation	Sensors, 2021, v. 21, n. 5, p. 1-15 How to Cite? DOI: http://dx.doi.org/10.3390/s21051656
Abstract	At present, in the field of video-based human action recognition, deep neural networks are mainly divided into two branches: the 2D convolutional neural network (CNN) and 3D CNN. However, 2D CNN’s temporal and spatial feature extraction processes are independent of each other, which means that it is easy to ignore the internal connection, affecting the performance of recognition. Although 3D CNN can extract the temporal and spatial features of the video sequence at the same time, the parameters of the 3D model increase exponentially, resulting in the model being difficult to train and transfer. To solve this problem, this article is based on 3D CNN combined with a residual structure and attention mechanism to improve the existing 3D CNN model, and we propose two types of human action recognition models (the Residual 3D Network (R3D) and Attention Residual 3D Network (AR3D)). Firstly, in this article, we propose a shallow feature extraction module and improve the ordinary 3D residual structure, which reduces the parameters and strengthens the extraction of temporal features. Secondly, we explore the application of the attention mechanism in human action recognition and design a 3D spatio-temporal attention mechanism module to strengthen the extraction of global features of human action. Finally, in order to make full use of the residual structure and attention mechanism, an Attention Residual 3D Network (AR3D) is proposed, and its two fusion strategies and corresponding model structure (AR3D_V1, AR3D_V2) are introduced in detail. Experiments show that the fused structure shows different degrees of performance improvement compared to a single structure.
Persistent Identifier	http://hdl.handle.net/10722/327319
ISSN	1424-8220 2023 Impact Factor: 3.4 2023 SCImago Journal Rankings: 0.786
ISI Accession Number ID	WOS:000628552700001

DC Field	Value	Language
dc.contributor.author	Dong, Min	-
dc.contributor.author	Fang, Zhenglin	-
dc.contributor.author	Li, Yongfa	-
dc.contributor.author	Bi, Sheng	-
dc.contributor.author	Chen, Jiangcheng	-
dc.date.accessioned	2023-03-31T05:30:29Z	-
dc.date.available	2023-03-31T05:30:29Z	-
dc.date.issued	2021	-
dc.identifier.citation	Sensors, 2021, v. 21, n. 5, p. 1-15	-
dc.identifier.issn	1424-8220	-
dc.identifier.uri	http://hdl.handle.net/10722/327319	-
dc.description.abstract	At present, in the field of video-based human action recognition, deep neural networks are mainly divided into two branches: the 2D convolutional neural network (CNN) and 3D CNN. However, 2D CNN’s temporal and spatial feature extraction processes are independent of each other, which means that it is easy to ignore the internal connection, affecting the performance of recognition. Although 3D CNN can extract the temporal and spatial features of the video sequence at the same time, the parameters of the 3D model increase exponentially, resulting in the model being difficult to train and transfer. To solve this problem, this article is based on 3D CNN combined with a residual structure and attention mechanism to improve the existing 3D CNN model, and we propose two types of human action recognition models (the Residual 3D Network (R3D) and Attention Residual 3D Network (AR3D)). Firstly, in this article, we propose a shallow feature extraction module and improve the ordinary 3D residual structure, which reduces the parameters and strengthens the extraction of temporal features. Secondly, we explore the application of the attention mechanism in human action recognition and design a 3D spatio-temporal attention mechanism module to strengthen the extraction of global features of human action. Finally, in order to make full use of the residual structure and attention mechanism, an Attention Residual 3D Network (AR3D) is proposed, and its two fusion strategies and corresponding model structure (AR3D_V1, AR3D_V2) are introduced in detail. Experiments show that the fused structure shows different degrees of performance improvement compared to a single structure.	-
dc.language	eng	-
dc.relation.ispartof	Sensors	-
dc.subject	3D	-
dc.subject	Action recognition	-
dc.subject	Attention mechanism	-
dc.subject	Convolutional neural network	-
dc.subject	Residual	-
dc.title	Ar3d: Attention residual 3D network for human action recognition	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.3390/s21051656	-
dc.identifier.pmid	33670835	-
dc.identifier.scopus	eid_2-s2.0-85101682101	-
dc.identifier.volume	21	-
dc.identifier.issue	5	-
dc.identifier.spage	1	-
dc.identifier.epage	15	-
dc.identifier.isi	WOS:000628552700001	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Ar3d: Attention residual 3D network for human action recognition

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats