Locality-constrained spatial transformer network for video crowd counting

Fang, Yanyan; Zhan, Biyun; Cai, Wandi; Gao, Shenghua; Hu, Bo

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/ICME.2019.00145
Scopus: eid_2-s2.0-85071039008
WOS: WOS:000501820600137
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Locality-constrained spatial transformer network for video crowd counting

Title	Locality-constrained spatial transformer network for video crowd counting
Authors	Fang, Yanyan Zhan, Biyun Cai, Wandi Gao, Shenghua Hu, Bo
Keywords	Convolutional neural network Locality constrained Spatial transformer network Video crowd counting
Issue Date	2019
Citation	Proceedings - IEEE International Conference on Multimedia and Expo, 2019, v. 2019-July, p. 814-819 How to Cite? DOI: http://dx.doi.org/10.1109/ICME.2019.00145
Abstract	Compared with single image based crowd counting, video provides the spatial-temporal information of the crowd that would help improve the robustness of crowd counting. But translation, rotation and scaling of people lead to the change of density map of heads between neighbouring frames. Meanwhile, people walking in/out or being occluded in dynamic scenes leads to the change of head counts. To alleviate these issues in video crowd counting, a Locality-constrained Spatial Transformer Network (LSTN) is proposed. Specifically, we first leverage a Convolutional Neural Networks to estimate the density map for each frame. Then to relate the density maps between neighbouring frames, a Locality-constrained Spatial Transformer (LST) module is introduced to estimate the density map of next frame with that of current frame. To facilitate the performance evaluation, a large-scale video crowd counting dataset is collected, which contains 15K frames with about 394K annotated heads captured from 13 different scenes. As far as we know, it is the largest video crowd counting dataset. Extensive experiments on our dataset and other crowd counting datasets validate the effectiveness of our LSTN for crowd counting. All our dataset are released in https://github.com/sweetyy83/Lstn-fdst-dataset.
Persistent Identifier	http://hdl.handle.net/10722/344990
ISSN	1945-7871 2020 SCImago Journal Rankings: 0.368
ISI Accession Number ID	WOS:000501820600137

DC Field	Value	Language
dc.contributor.author	Fang, Yanyan	-
dc.contributor.author	Zhan, Biyun	-
dc.contributor.author	Cai, Wandi	-
dc.contributor.author	Gao, Shenghua	-
dc.contributor.author	Hu, Bo	-
dc.date.accessioned	2024-08-15T09:24:32Z	-
dc.date.available	2024-08-15T09:24:32Z	-
dc.date.issued	2019	-
dc.identifier.citation	Proceedings - IEEE International Conference on Multimedia and Expo, 2019, v. 2019-July, p. 814-819	-
dc.identifier.issn	1945-7871	-
dc.identifier.uri	http://hdl.handle.net/10722/344990	-
dc.description.abstract	Compared with single image based crowd counting, video provides the spatial-temporal information of the crowd that would help improve the robustness of crowd counting. But translation, rotation and scaling of people lead to the change of density map of heads between neighbouring frames. Meanwhile, people walking in/out or being occluded in dynamic scenes leads to the change of head counts. To alleviate these issues in video crowd counting, a Locality-constrained Spatial Transformer Network (LSTN) is proposed. Specifically, we first leverage a Convolutional Neural Networks to estimate the density map for each frame. Then to relate the density maps between neighbouring frames, a Locality-constrained Spatial Transformer (LST) module is introduced to estimate the density map of next frame with that of current frame. To facilitate the performance evaluation, a large-scale video crowd counting dataset is collected, which contains 15K frames with about 394K annotated heads captured from 13 different scenes. As far as we know, it is the largest video crowd counting dataset. Extensive experiments on our dataset and other crowd counting datasets validate the effectiveness of our LSTN for crowd counting. All our dataset are released in https://github.com/sweetyy83/Lstn-fdst-dataset.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings - IEEE International Conference on Multimedia and Expo	-
dc.subject	Convolutional neural network	-
dc.subject	Locality constrained	-
dc.subject	Spatial transformer network	-
dc.subject	Video crowd counting	-
dc.title	Locality-constrained spatial transformer network for video crowd counting	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/ICME.2019.00145	-
dc.identifier.scopus	eid_2-s2.0-85071039008	-
dc.identifier.volume	2019-July	-
dc.identifier.spage	814	-
dc.identifier.epage	819	-
dc.identifier.eissn	1945-788X	-
dc.identifier.isi	WOS:000501820600137	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Locality-constrained spatial transformer network for video crowd counting

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats