File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Appears in Collections:
Conference Paper: Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
Title | Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video |
---|---|
Authors | |
Issue Date | 2019 |
Publisher | Association for Computational Linguistics. |
Citation | Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy, 28 July - 2 August 2019, p. 1884–1894 How to Cite? |
Abstract | In this paper, we address a novel task, namely weakly-supervised spatio-temporally grounding natural sentence in video. Specifically, given a natural sentence and a video, we localize a spatio-temporal tube in the video that semantically corresponds to the given sentence, with no reliance on any spatio-temporal annotations during training. First, a set of spatiotemporal tubes, referred to as instances, are
extracted from the video. We then encode these instances and the sentence using our proposed
attentive interactor which can exploit their fine-grained relationships to characterize their matching behaviors. Besides a ranking loss, a novel diversity loss is introduced to train the proposed attentive interactor to strengthen the matching behaviors of reliable instance-sentence pairs and penalize the unreliable ones. Moreover, we also contribute a dataset, called VID-sentence, based on the ImageNet
video object detection dataset, to serve as a benchmark for our task. Extensive experimental results demonstrate the superiority of our model over the baseline approaches. Our code and the constructed VID-sentence dataset are available at: https://github.com/JeffCHEN2017/WSSTG.git. |
Description | Session 3E: Vision, Robotics, Multimodal, Grounding and Speech |
Persistent Identifier | http://hdl.handle.net/10722/272014 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Chen, Z | - |
dc.contributor.author | Ma, L | - |
dc.contributor.author | Luo, W | - |
dc.contributor.author | Wong, KKY | - |
dc.date.accessioned | 2019-07-20T10:33:59Z | - |
dc.date.available | 2019-07-20T10:33:59Z | - |
dc.date.issued | 2019 | - |
dc.identifier.citation | Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy, 28 July - 2 August 2019, p. 1884–1894 | - |
dc.identifier.uri | http://hdl.handle.net/10722/272014 | - |
dc.description | Session 3E: Vision, Robotics, Multimodal, Grounding and Speech | - |
dc.description.abstract | In this paper, we address a novel task, namely weakly-supervised spatio-temporally grounding natural sentence in video. Specifically, given a natural sentence and a video, we localize a spatio-temporal tube in the video that semantically corresponds to the given sentence, with no reliance on any spatio-temporal annotations during training. First, a set of spatiotemporal tubes, referred to as instances, are extracted from the video. We then encode these instances and the sentence using our proposed attentive interactor which can exploit their fine-grained relationships to characterize their matching behaviors. Besides a ranking loss, a novel diversity loss is introduced to train the proposed attentive interactor to strengthen the matching behaviors of reliable instance-sentence pairs and penalize the unreliable ones. Moreover, we also contribute a dataset, called VID-sentence, based on the ImageNet video object detection dataset, to serve as a benchmark for our task. Extensive experimental results demonstrate the superiority of our model over the baseline approaches. Our code and the constructed VID-sentence dataset are available at: https://github.com/JeffCHEN2017/WSSTG.git. | - |
dc.language | eng | - |
dc.publisher | Association for Computational Linguistics. | - |
dc.relation.ispartof | Annual Meeting of the Association for Computational Linguistics | - |
dc.title | Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video | - |
dc.type | Conference_Paper | - |
dc.identifier.email | Wong, KKY: kykwong@cs.hku.hk | - |
dc.identifier.authority | Wong, KKY=rp01393 | - |
dc.identifier.hkuros | 299481 | - |
dc.identifier.spage | 1884–1894 | - |
dc.identifier.epage | 1884–1894 | - |
dc.publisher.place | Florence, Italy | - |