File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Move Forward and Tell: A Progressive Generator of Video Descriptions

TitleMove Forward and Tell: A Progressive Generator of Video Descriptions
Authors
KeywordsMove forward and tell
Recurrent network
Reinforcement learning
Repetition evaluation
Video captioning
Issue Date2018
Citation
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, v. 11215 LNCS, p. 489-505 How to Cite?
AbstractWe present an efficient framework that can generate a coherent paragraph to describe a given video. Previous works on video captioning usually focus on video clips. They typically treat an entire video as a whole and generate the caption conditioned on a single embedding. On the contrary, we consider videos with rich temporal structures and aim to generate paragraph descriptions that can preserve the story flow while being coherent and concise. Towards this goal, we propose a new approach, which produces a descriptive paragraph by assembling temporally localized descriptions. Given a video, it selects a sequence of distinctive clips and generates sentences thereon in a coherent manner. Particularly, the selection of clips and the production of sentences are done jointly and progressively driven by a recurrent network – what to describe next depends on what have been said before. Here, the recurrent network is learned via self-critical sequence training with both sentence-level and paragraph-level rewards. On the ActivityNet Captions dataset, our method demonstrated the capability of generating high-quality paragraph descriptions for videos. Compared to those by other methods, the descriptions produced by our method are often more relevant, more coherent, and more concise.
Persistent Identifierhttp://hdl.handle.net/10722/352472
ISSN
2023 SCImago Journal Rankings: 0.606

 

DC FieldValueLanguage
dc.contributor.authorXiong, Yilei-
dc.contributor.authorDai, Bo-
dc.contributor.authorLin, Dahua-
dc.date.accessioned2024-12-16T03:59:16Z-
dc.date.available2024-12-16T03:59:16Z-
dc.date.issued2018-
dc.identifier.citationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, v. 11215 LNCS, p. 489-505-
dc.identifier.issn0302-9743-
dc.identifier.urihttp://hdl.handle.net/10722/352472-
dc.description.abstractWe present an efficient framework that can generate a coherent paragraph to describe a given video. Previous works on video captioning usually focus on video clips. They typically treat an entire video as a whole and generate the caption conditioned on a single embedding. On the contrary, we consider videos with rich temporal structures and aim to generate paragraph descriptions that can preserve the story flow while being coherent and concise. Towards this goal, we propose a new approach, which produces a descriptive paragraph by assembling temporally localized descriptions. Given a video, it selects a sequence of distinctive clips and generates sentences thereon in a coherent manner. Particularly, the selection of clips and the production of sentences are done jointly and progressively driven by a recurrent network – what to describe next depends on what have been said before. Here, the recurrent network is learned via self-critical sequence training with both sentence-level and paragraph-level rewards. On the ActivityNet Captions dataset, our method demonstrated the capability of generating high-quality paragraph descriptions for videos. Compared to those by other methods, the descriptions produced by our method are often more relevant, more coherent, and more concise.-
dc.languageeng-
dc.relation.ispartofLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)-
dc.subjectMove forward and tell-
dc.subjectRecurrent network-
dc.subjectReinforcement learning-
dc.subjectRepetition evaluation-
dc.subjectVideo captioning-
dc.titleMove Forward and Tell: A Progressive Generator of Video Descriptions-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1007/978-3-030-01252-6_29-
dc.identifier.scopuseid_2-s2.0-85055133234-
dc.identifier.volume11215 LNCS-
dc.identifier.spage489-
dc.identifier.epage505-
dc.identifier.eissn1611-3349-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats