Temporal sequence distillation: Towards few-frame action recognition in videos

Zhang, Zhaoyang; Kuang, Zhanghui; Luo, Ping; Feng, Litong; Zhang, Wei

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3240508.3240534
Scopus: eid_2-s2.0-85058218893
WOS: WOS:000509665700030

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

See more details

Conference Paper: Temporal sequence distillation: Towards few-frame action recognition in videos

Title	Temporal sequence distillation: Towards few-frame action recognition in videos
Authors	Zhang, Zhaoyang Kuang, Zhanghui Luo, Ping Feng, Litong Zhang, Wei
Keywords	Video Action Recognition Temporal Sequence Distillation
Issue Date	2018
Citation	MM 2018 - Proceedings of the 2018 ACM Multimedia Conference, 2018, p. 257-264 How to Cite? DOI: http://dx.doi.org/10.1145/3240508.3240534
Abstract	© 2018 Association for Computing Machinery. Video Analytics Software as a Service (VA SaaS) has been rapidly growing in recent years. VA SaaS is typically accessed by users using a lightweight client. Because the transmission bandwidth between the client and cloud is usually limited and expensive, it brings great benefits to design cloud video analysis algorithms with a limited data transmission requirement. Although considerable research has been devoted to video analysis, to our best knowledge, little of them has paid attention to the transmission bandwidth limitation in SaaS. As the first attempt in this direction, this work introduces a problem of few-frame action recognition, which aims at maintaining high recognition accuracy, when accessing only a few frames during both training and test. Unlike previous work that processed dense frames, we present Temporal Sequence Distillation (TSD), which distills a long video sequence into a very short one for transmission. By end-to-end training with 3D CNNs for video action recognition, TSD learns a compact and discriminative temporal and spatial representation of video frames. On Kinetics dataset, TSD+I3D typically requires only 50% of the number of frames compared to I3D [1], a state-of-the-art video action recognition algorithm, to achieve almost the same accuracies. The proposed TSD has three appealing advantages. Firstly, TSD has a lightweight architecture, and can be deployed in the client, e.g., mobile devices, to produce compressed representative frames to save transmission bandwidth. Secondly, TSD significantly reduces the computations to run video action recognition with compressed frames on the cloud, while maintaining high recognition accuracies. Thirdly, TSD can be plugged in as a preprocessing module of any existing 3D CNNs. Extensive experiments show the effectiveness and characteristics of TSD.
Persistent Identifier	http://hdl.handle.net/10722/273739
ISI Accession Number ID	WOS:000509665700030

DC Field	Value	Language
dc.contributor.author	Zhang, Zhaoyang	-
dc.contributor.author	Kuang, Zhanghui	-
dc.contributor.author	Luo, Ping	-
dc.contributor.author	Feng, Litong	-
dc.contributor.author	Zhang, Wei	-
dc.date.accessioned	2019-08-12T09:56:31Z	-
dc.date.available	2019-08-12T09:56:31Z	-
dc.date.issued	2018	-
dc.identifier.citation	MM 2018 - Proceedings of the 2018 ACM Multimedia Conference, 2018, p. 257-264	-
dc.identifier.uri	http://hdl.handle.net/10722/273739	-
dc.description.abstract	© 2018 Association for Computing Machinery. Video Analytics Software as a Service (VA SaaS) has been rapidly growing in recent years. VA SaaS is typically accessed by users using a lightweight client. Because the transmission bandwidth between the client and cloud is usually limited and expensive, it brings great benefits to design cloud video analysis algorithms with a limited data transmission requirement. Although considerable research has been devoted to video analysis, to our best knowledge, little of them has paid attention to the transmission bandwidth limitation in SaaS. As the first attempt in this direction, this work introduces a problem of few-frame action recognition, which aims at maintaining high recognition accuracy, when accessing only a few frames during both training and test. Unlike previous work that processed dense frames, we present Temporal Sequence Distillation (TSD), which distills a long video sequence into a very short one for transmission. By end-to-end training with 3D CNNs for video action recognition, TSD learns a compact and discriminative temporal and spatial representation of video frames. On Kinetics dataset, TSD+I3D typically requires only 50% of the number of frames compared to I3D [1], a state-of-the-art video action recognition algorithm, to achieve almost the same accuracies. The proposed TSD has three appealing advantages. Firstly, TSD has a lightweight architecture, and can be deployed in the client, e.g., mobile devices, to produce compressed representative frames to save transmission bandwidth. Secondly, TSD significantly reduces the computations to run video action recognition with compressed frames on the cloud, while maintaining high recognition accuracies. Thirdly, TSD can be plugged in as a preprocessing module of any existing 3D CNNs. Extensive experiments show the effectiveness and characteristics of TSD.	-
dc.language	eng	-
dc.relation.ispartof	MM 2018 - Proceedings of the 2018 ACM Multimedia Conference	-
dc.subject	Video Action Recognition	-
dc.subject	Temporal Sequence Distillation	-
dc.title	Temporal sequence distillation: Towards few-frame action recognition in videos	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1145/3240508.3240534	-
dc.identifier.scopus	eid_2-s2.0-85058218893	-
dc.identifier.spage	257	-
dc.identifier.epage	264	-
dc.identifier.isi	WOS:000509665700030	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Temporal sequence distillation: Towards few-frame action recognition in videos

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats