File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Learning Semantic-Aware Dynamics for Video Prediction

TitleLearning Semantic-Aware Dynamics for Video Prediction
Authors
Issue Date2021
Citation
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021, p. 902-912 How to Cite?
AbstractWe propose an architecture and training scheme to predict video frames by explicitly modeling dis-occlusions and capturing the evolution of semantically consistent regions in the video. The scene layout (semantic map) and motion (optical flow) are decomposed into layers, which are predicted and fused with their context to generate future layouts and motions. The appearance of the scene is warped from past frames using the predicted motion in co-visible regions; dis-occluded regions are synthesized with content-aware inpainting utilizing the predicted scene layout. The result is a predictive model that explicitly represents objects and learns their class-specific motion, which we evaluate on video prediction benchmarks.
Persistent Identifierhttp://hdl.handle.net/10722/325537
ISSN
2020 SCImago Journal Rankings: 4.658

 

DC FieldValueLanguage
dc.contributor.authorBei, Xinzhu-
dc.contributor.authorYang, Yanchao-
dc.contributor.authorSoatto, Stefano-
dc.date.accessioned2023-02-27T07:34:06Z-
dc.date.available2023-02-27T07:34:06Z-
dc.date.issued2021-
dc.identifier.citationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021, p. 902-912-
dc.identifier.issn1063-6919-
dc.identifier.urihttp://hdl.handle.net/10722/325537-
dc.description.abstractWe propose an architecture and training scheme to predict video frames by explicitly modeling dis-occlusions and capturing the evolution of semantically consistent regions in the video. The scene layout (semantic map) and motion (optical flow) are decomposed into layers, which are predicted and fused with their context to generate future layouts and motions. The appearance of the scene is warped from past frames using the predicted motion in co-visible regions; dis-occluded regions are synthesized with content-aware inpainting utilizing the predicted scene layout. The result is a predictive model that explicitly represents objects and learns their class-specific motion, which we evaluate on video prediction benchmarks.-
dc.languageeng-
dc.relation.ispartofProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition-
dc.titleLearning Semantic-Aware Dynamics for Video Prediction-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/CVPR46437.2021.00096-
dc.identifier.scopuseid_2-s2.0-85114889118-
dc.identifier.spage902-
dc.identifier.epage912-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats