Object detection in videos with tubelet proposal networks

Kang, Kai; Li, Hongsheng; Xiao, Tong; Ouyang, Wanli; Yan, Junjie; Liu, Xihui; Wang, Xiaogang

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/CVPR.2017.101
Scopus: eid_2-s2.0-85041925966
WOS: WOS:000418371400094

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Electrical & Electronic Engineering: Conference papers

Conference Paper: Object detection in videos with tubelet proposal networks

Title	Object detection in videos with tubelet proposal networks
Authors	Kang, Kai Li, Hongsheng Xiao, Tong Ouyang, Wanli Yan, Junjie Liu, Xihui Wang, Xiaogang
Issue Date	2017
Citation	Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, v. 2017-January, p. 889-897 How to Cite? DOI: http://dx.doi.org/10.1109/CVPR.2017.101
Abstract	Object detection in videos has drawn increasing attention recently with the introduction of the large-scale ImageNet VID dataset. Different from object detection in static images, temporal information in videos is vital for object detection. To fully utilize temporal information, state-of-the-art methods [15, 14] are based on spatiotemporal tubelets, which are essentially sequences of associated bounding boxes across time. However, the existing methods have major limitations in generating tubelets in terms of quality and efficiency. Motion-based [14] methods are able to obtain dense tubelets efficiently, but the lengths are generally only several frames, which is not optimal for incorporating long-term temporal information. Appearance-based [15] methods, usually involving generic object tracking, could generate long tubelets, but are usually computationally expensive. In this work, we propose a framework for object detection in videos, which consists of a novel tubelet proposal network to efficiently generate spatiotemporal proposals, and a Long Short-term Memory (LSTM) network that incorporates temporal information from tubelet proposals for achieving high object detection accuracy in videos. Experiments on the large-scale ImageNet VID dataset demonstrate the effectiveness of the proposed framework for object detection in videos.
Persistent Identifier	http://hdl.handle.net/10722/316490
ISI Accession Number ID	WOS:000418371400094

DC Field	Value	Language
dc.contributor.author	Kang, Kai	-
dc.contributor.author	Li, Hongsheng	-
dc.contributor.author	Xiao, Tong	-
dc.contributor.author	Ouyang, Wanli	-
dc.contributor.author	Yan, Junjie	-
dc.contributor.author	Liu, Xihui	-
dc.contributor.author	Wang, Xiaogang	-
dc.date.accessioned	2022-09-14T11:40:34Z	-
dc.date.available	2022-09-14T11:40:34Z	-
dc.date.issued	2017	-
dc.identifier.citation	Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, v. 2017-January, p. 889-897	-
dc.identifier.uri	http://hdl.handle.net/10722/316490	-
dc.description.abstract	Object detection in videos has drawn increasing attention recently with the introduction of the large-scale ImageNet VID dataset. Different from object detection in static images, temporal information in videos is vital for object detection. To fully utilize temporal information, state-of-the-art methods [15, 14] are based on spatiotemporal tubelets, which are essentially sequences of associated bounding boxes across time. However, the existing methods have major limitations in generating tubelets in terms of quality and efficiency. Motion-based [14] methods are able to obtain dense tubelets efficiently, but the lengths are generally only several frames, which is not optimal for incorporating long-term temporal information. Appearance-based [15] methods, usually involving generic object tracking, could generate long tubelets, but are usually computationally expensive. In this work, we propose a framework for object detection in videos, which consists of a novel tubelet proposal network to efficiently generate spatiotemporal proposals, and a Long Short-term Memory (LSTM) network that incorporates temporal information from tubelet proposals for achieving high object detection accuracy in videos. Experiments on the large-scale ImageNet VID dataset demonstrate the effectiveness of the proposed framework for object detection in videos.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017	-
dc.title	Object detection in videos with tubelet proposal networks	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/CVPR.2017.101	-
dc.identifier.scopus	eid_2-s2.0-85041925966	-
dc.identifier.volume	2017-January	-
dc.identifier.spage	889	-
dc.identifier.epage	897	-
dc.identifier.isi	WOS:000418371400094	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Object detection in videos with tubelet proposal networks

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats