GeometryMotion-Net: A Strong Two-Stream Baseline for 3D Action Recognition

Liu, Jiaheng; Xu, Dong

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TCSVT.2021.3101847
Scopus: eid_2-s2.0-85112593687
WOS: WOS:000725812500016
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: GeometryMotion-Net: A Strong Two-Stream Baseline for 3D Action Recognition

Title	GeometryMotion-Net: A Strong Two-Stream Baseline for 3D Action Recognition
Authors	Liu, Jiaheng Xu, Dong
Keywords	3D action recognition Point cloud two-stream
Issue Date	2021
Citation	IEEE Transactions on Circuits and Systems for Video Technology, 2021, v. 31, n. 12, p. 4711-4721 How to Cite? DOI: http://dx.doi.org/10.1109/TCSVT.2021.3101847
Abstract	In this work, we propose a strong two-stream baseline method referred to as GeometryMotion-Net for 3D action recognition. For efficient 3D action recognition, we first represent each point cloud sequence as a limited number of randomly sampled frames with each frame consisting of a sparse set of points. After that, we propose a new two-stream framework for effective 3D action recognition. For the geometry stream, we propose a new module to produce a virtual overall geometry point cloud by first merging all 3D points from these selected frames, and then we exploit local neighborhood information of each point in the feature space. In the motion stream, for any two neighboring point cloud frames, we also propose a new module to generate one virtual forward motion point cloud and one virtual backward motion point cloud. Specifically, for each point in the current frame, we first produce a set of 3D offset features relative to the neighboring points in the reference frame (i.e., the previous/subsequent frame) and then exploit local neighborhood information of this point in the offset feature space. Based on the newly generated virtual overall geometry point cloud and multiple virtual forward/backward motion point clouds, any existing point cloud analysis methods (e.g., PointNet) can be readily adopted for extracting discriminant geometry and bidirectional motion features in the geometry and motion streams, respectively, which are further aggregated to make our two-stream network trainable in an end-to-end fashion. Comprehensive experiments on both large-scale datasets (i.e. NTU RGB+D 60 and NTU RGB+D 120) and small-scale datasets (i.e., N-UCLA and UWA3DII) demonstrate the effectiveness and efficiency of our two-stream network for 3D action recognition.
Persistent Identifier	http://hdl.handle.net/10722/321956
ISSN	1051-8215 2023 Impact Factor: 8.3 2023 SCImago Journal Rankings: 2.299
ISI Accession Number ID	WOS:000725812500016

DC Field	Value	Language
dc.contributor.author	Liu, Jiaheng	-
dc.contributor.author	Xu, Dong	-
dc.date.accessioned	2022-11-03T02:22:37Z	-
dc.date.available	2022-11-03T02:22:37Z	-
dc.date.issued	2021	-
dc.identifier.citation	IEEE Transactions on Circuits and Systems for Video Technology, 2021, v. 31, n. 12, p. 4711-4721	-
dc.identifier.issn	1051-8215	-
dc.identifier.uri	http://hdl.handle.net/10722/321956	-
dc.description.abstract	In this work, we propose a strong two-stream baseline method referred to as GeometryMotion-Net for 3D action recognition. For efficient 3D action recognition, we first represent each point cloud sequence as a limited number of randomly sampled frames with each frame consisting of a sparse set of points. After that, we propose a new two-stream framework for effective 3D action recognition. For the geometry stream, we propose a new module to produce a virtual overall geometry point cloud by first merging all 3D points from these selected frames, and then we exploit local neighborhood information of each point in the feature space. In the motion stream, for any two neighboring point cloud frames, we also propose a new module to generate one virtual forward motion point cloud and one virtual backward motion point cloud. Specifically, for each point in the current frame, we first produce a set of 3D offset features relative to the neighboring points in the reference frame (i.e., the previous/subsequent frame) and then exploit local neighborhood information of this point in the offset feature space. Based on the newly generated virtual overall geometry point cloud and multiple virtual forward/backward motion point clouds, any existing point cloud analysis methods (e.g., PointNet) can be readily adopted for extracting discriminant geometry and bidirectional motion features in the geometry and motion streams, respectively, which are further aggregated to make our two-stream network trainable in an end-to-end fashion. Comprehensive experiments on both large-scale datasets (i.e. NTU RGB+D 60 and NTU RGB+D 120) and small-scale datasets (i.e., N-UCLA and UWA3DII) demonstrate the effectiveness and efficiency of our two-stream network for 3D action recognition.	-
dc.language	eng	-
dc.relation.ispartof	IEEE Transactions on Circuits and Systems for Video Technology	-
dc.subject	3D action recognition	-
dc.subject	Point cloud	-
dc.subject	two-stream	-
dc.title	GeometryMotion-Net: A Strong Two-Stream Baseline for 3D Action Recognition	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/TCSVT.2021.3101847	-
dc.identifier.scopus	eid_2-s2.0-85112593687	-
dc.identifier.volume	31	-
dc.identifier.issue	12	-
dc.identifier.spage	4711	-
dc.identifier.epage	4721	-
dc.identifier.eissn	1558-2205	-
dc.identifier.isi	WOS:000725812500016	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: GeometryMotion-Net: A Strong Two-Stream Baseline for 3D Action Recognition

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats