Compressed video contrastive learning

Huo, Y; Ding, M; Lu, H; Fei, N; Lu, Z; Wen, J; Luo, P

File Download

There are no files associated with this item.

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Compressed video contrastive learning

Title	Compressed video contrastive learning
Authors	Huo, Y Ding, M Lu, H Fei, N Lu, Z Wen, J Luo, P
Keywords	Self-supervised learning Contrastive learning Representation learning
Issue Date	2021
Publisher	Neural Information Processing Systems Foundation.
Citation	35th Conference on Neural Information Processing Systems (NeurlPS 2021) (Virutal), December 6-14, 2021. In Advances in Neural Information Processing Systems 34 (NeurIPS 2021), p. 14176-14187 How to Cite?
Abstract	This work concerns self-supervised video representation learning (SSVRL), one topic that has received much attention recently. Since videos are storage-intensive and contain a rich source of visual content, models designed for SSVRL are expected to be storage- and computation-efficient, as well as effective. However, most existing methods only focus on one of the two objectives, failing to consider both at the same time. In this work, for the first time, the seemingly contradictory goals are simultaneously achieved by exploiting compressed videos and capturing mutual information between two input streams. Specifically, a novel Motion Vector based Cross Guidance Contrastive learning approach (MVCGC) is proposed. For storage and computation efficiency, we choose to directly decode RGB frames and motion vectors (that resemble low-resolution optical flows) from compressed videos on-the-fly. To enhance the representation ability of the motion vectors, hence the effectiveness of our method, we design a cross guidance contrastive learning algorithm based on multi-instance InfoNCE loss, where motion vectors can take supervision signals from RGB frames and vice versa. Comprehensive experiments on two downstream tasks show that our MVCGC yields new state-of-the-art while being significantly more efficient than its competitors.
Persistent Identifier	http://hdl.handle.net/10722/315681

DC Field	Value	Language
dc.contributor.author	Huo, Y	-
dc.contributor.author	Ding, M	-
dc.contributor.author	Lu, H	-
dc.contributor.author	Fei, N	-
dc.contributor.author	Lu, Z	-
dc.contributor.author	Wen, J	-
dc.contributor.author	Luo, P	-
dc.date.accessioned	2022-08-19T09:02:27Z	-
dc.date.available	2022-08-19T09:02:27Z	-
dc.date.issued	2021	-
dc.identifier.citation	35th Conference on Neural Information Processing Systems (NeurlPS 2021) (Virutal), December 6-14, 2021. In Advances in Neural Information Processing Systems 34 (NeurIPS 2021), p. 14176-14187	-
dc.identifier.uri	http://hdl.handle.net/10722/315681	-
dc.description.abstract	This work concerns self-supervised video representation learning (SSVRL), one topic that has received much attention recently. Since videos are storage-intensive and contain a rich source of visual content, models designed for SSVRL are expected to be storage- and computation-efficient, as well as effective. However, most existing methods only focus on one of the two objectives, failing to consider both at the same time. In this work, for the first time, the seemingly contradictory goals are simultaneously achieved by exploiting compressed videos and capturing mutual information between two input streams. Specifically, a novel Motion Vector based Cross Guidance Contrastive learning approach (MVCGC) is proposed. For storage and computation efficiency, we choose to directly decode RGB frames and motion vectors (that resemble low-resolution optical flows) from compressed videos on-the-fly. To enhance the representation ability of the motion vectors, hence the effectiveness of our method, we design a cross guidance contrastive learning algorithm based on multi-instance InfoNCE loss, where motion vectors can take supervision signals from RGB frames and vice versa. Comprehensive experiments on two downstream tasks show that our MVCGC yields new state-of-the-art while being significantly more efficient than its competitors.	-
dc.language	eng	-
dc.publisher	Neural Information Processing Systems Foundation.	-
dc.relation.ispartof	Advances in Neural Information Processing Systems 34 (NeurIPS 2021)	-
dc.subject	Self-supervised learning	-
dc.subject	Contrastive learning	-
dc.subject	Representation learning	-
dc.title	Compressed video contrastive learning	-
dc.type	Conference_Paper	-
dc.identifier.email	Luo, P: pluo@hku.hk	-
dc.identifier.authority	Luo, P=rp02575	-
dc.identifier.hkuros	335595	-
dc.identifier.spage	14176	-
dc.identifier.epage	14187	-
dc.publisher.place	United States	-

File Download

Supplementary

Conference Paper: Compressed video contrastive learning

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats