File Download

There are no files associated with this item.

Supplementary

Conference Paper: ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

TitleViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
Authors
Issue Date2021
Citation
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Conference, 19-25 June 2021, p. 16072-16081 How to Cite?
AbstractHuman pose estimation has achieved significant progress in recent years. However, most of the recent methods focus on improving accuracy using complicated models and ignoring real-time efficiency. To achieve a better trade-off between accuracy and efficiency, we propose a novel neural architecture search (NAS) method, termed ViPNAS, to search networks in both spatial and temporal levels for fast online video pose estimation. In the spatial level, we carefully design the search space with five different dimensions including network depth, width, kernel size, group number, and attentions. In the temporal level, we search from a series of temporal feature fusions to optimize the total accuracy and speed across multiple video frames. To the best of our knowledge, we are the first to search for the temporal feature fusion and automatic computation allocation in videos. Extensive experiments demonstrate the effectiveness of our approach on the challenging COCO2017 and PoseTrack2018 datasets. Our discovered model family, S-ViPNAS and T-ViPNAS, achieve significantly higher inference speed (CPU real-time) without sacrificing the accuracy compared to the previous state-of-the-art methods.
DescriptionPaper Session Twelve: Paper ID 5887
Persistent Identifierhttp://hdl.handle.net/10722/301429

 

DC FieldValueLanguage
dc.contributor.authorXu, L-
dc.contributor.authorGuan, Y-
dc.contributor.authorJin, S-
dc.contributor.authorLiu, W-
dc.contributor.authorQian, C-
dc.contributor.authorLuo, P-
dc.contributor.authorOuyang, W-
dc.contributor.authorWang, X-
dc.date.accessioned2021-07-27T08:10:56Z-
dc.date.available2021-07-27T08:10:56Z-
dc.date.issued2021-
dc.identifier.citationProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Conference, 19-25 June 2021, p. 16072-16081-
dc.identifier.urihttp://hdl.handle.net/10722/301429-
dc.descriptionPaper Session Twelve: Paper ID 5887-
dc.description.abstractHuman pose estimation has achieved significant progress in recent years. However, most of the recent methods focus on improving accuracy using complicated models and ignoring real-time efficiency. To achieve a better trade-off between accuracy and efficiency, we propose a novel neural architecture search (NAS) method, termed ViPNAS, to search networks in both spatial and temporal levels for fast online video pose estimation. In the spatial level, we carefully design the search space with five different dimensions including network depth, width, kernel size, group number, and attentions. In the temporal level, we search from a series of temporal feature fusions to optimize the total accuracy and speed across multiple video frames. To the best of our knowledge, we are the first to search for the temporal feature fusion and automatic computation allocation in videos. Extensive experiments demonstrate the effectiveness of our approach on the challenging COCO2017 and PoseTrack2018 datasets. Our discovered model family, S-ViPNAS and T-ViPNAS, achieve significantly higher inference speed (CPU real-time) without sacrificing the accuracy compared to the previous state-of-the-art methods.-
dc.languageeng-
dc.relation.ispartofIEEE Computer Vision and Pattern Recognition (CVPR) Proceedings-
dc.titleViPNAS: Efficient Video Pose Estimation via Neural Architecture Search-
dc.typeConference_Paper-
dc.identifier.emailLuo, P: pluo@hku.hk-
dc.identifier.authorityLuo, P=rp02575-
dc.identifier.hkuros323747-
dc.identifier.spage16072-
dc.identifier.epage16081-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats