File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Towards Diverse and Natural Image Descriptions via a Conditional GAN

TitleTowards Diverse and Natural Image Descriptions via a Conditional GAN
Authors
Issue Date2017
Citation
Proceedings of the IEEE International Conference on Computer Vision, 2017, v. 2017-October, p. 2989-2998 How to Cite?
AbstractDespite the substantial progress in recent years, the image captioning techniques are still far from being perfect. Sentences produced by existing methods, e.g. those based on RNNs, are often overly rigid and lacking in variability. This issue is related to a learning principle widely used in practice, that is, to maximize the likelihood of training samples. This principle encourages high resemblance to the 'ground-truth' captions, while suppressing other reasonable descriptions. Conventional evaluation metrics, e.g. BLEU and METEOR, also favor such restrictive methods. In this paper, we explore an alternative approach, with the aim to improve the naturalness and diversity - two essential properties of human expression. Specifically, we propose a new framework based on Conditional Generative Adversarial Networks (CGAN), which jointly learns a generator to produce descriptions conditioned on images and an evaluator to assess how well a description fits the visual content. It is noteworthy that training a sequence generator is nontrivial. We overcome the difficulty by Policy Gradient, a strategy stemming from Reinforcement Learning, which allows the generator to receive early feedback along the way. We tested our method on two large datasets, where it performed competitively against real people in our user study and outperformed other methods on various tasks.
Persistent Identifierhttp://hdl.handle.net/10722/352162
ISSN
2023 SCImago Journal Rankings: 12.263
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorDai, Bo-
dc.contributor.authorFidler, Sanja-
dc.contributor.authorUrtasun, Raquel-
dc.contributor.authorLin, Dahua-
dc.date.accessioned2024-12-16T03:57:04Z-
dc.date.available2024-12-16T03:57:04Z-
dc.date.issued2017-
dc.identifier.citationProceedings of the IEEE International Conference on Computer Vision, 2017, v. 2017-October, p. 2989-2998-
dc.identifier.issn1550-5499-
dc.identifier.urihttp://hdl.handle.net/10722/352162-
dc.description.abstractDespite the substantial progress in recent years, the image captioning techniques are still far from being perfect. Sentences produced by existing methods, e.g. those based on RNNs, are often overly rigid and lacking in variability. This issue is related to a learning principle widely used in practice, that is, to maximize the likelihood of training samples. This principle encourages high resemblance to the 'ground-truth' captions, while suppressing other reasonable descriptions. Conventional evaluation metrics, e.g. BLEU and METEOR, also favor such restrictive methods. In this paper, we explore an alternative approach, with the aim to improve the naturalness and diversity - two essential properties of human expression. Specifically, we propose a new framework based on Conditional Generative Adversarial Networks (CGAN), which jointly learns a generator to produce descriptions conditioned on images and an evaluator to assess how well a description fits the visual content. It is noteworthy that training a sequence generator is nontrivial. We overcome the difficulty by Policy Gradient, a strategy stemming from Reinforcement Learning, which allows the generator to receive early feedback along the way. We tested our method on two large datasets, where it performed competitively against real people in our user study and outperformed other methods on various tasks.-
dc.languageeng-
dc.relation.ispartofProceedings of the IEEE International Conference on Computer Vision-
dc.titleTowards Diverse and Natural Image Descriptions via a Conditional GAN-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/ICCV.2017.323-
dc.identifier.scopuseid_2-s2.0-85041897597-
dc.identifier.volume2017-October-
dc.identifier.spage2989-
dc.identifier.epage2998-
dc.identifier.isiWOS:000425498403006-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats