Show, tell and discriminate: Image captioning by self-retrieval with partially labeled data

Liu, Xihui; Li, Hongsheng; Shao, Jing; Chen, Dapeng; Wang, Xiaogang

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/978-3-030-01267-0_21
Scopus: eid_2-s2.0-85055422041
WOS: WOS:000612999000021
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Electrical & Electronic Engineering: Conference papers

Conference Paper: Show, tell and discriminate: Image captioning by self-retrieval with partially labeled data

Title	Show, tell and discriminate: Image captioning by self-retrieval with partially labeled data
Authors	Liu, Xihui Li, Hongsheng Shao, Jing Chen, Dapeng Wang, Xiaogang
Keywords	Image captioning Language and vision Text-image retrieval
Issue Date	2018
Publisher	Springer
Citation	15th European Conference on Computer Vision (ECCV 2018), Munich, Germany, September 8-14 2018. In Ferrari, V, Hebert, M, Sminchisescu, C, et al. (Eds), Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV, p. 353-369. Cham, Switzerland: Springer, 2018 How to Cite? DOI: http://dx.doi.org/10.1007/978-3-030-01267-0_21
Abstract	The aim of image captioning is to generate captions by machine to describe image contents. Despite many efforts, generating discriminative captions for images remains non-trivial. Most traditional approaches imitate the language structure patterns, thus tend to fall into a stereotype of replicating frequent phrases or sentences and neglect unique aspects of each image. In this work, we propose an image captioning framework with a self-retrieval module as training guidance, which encourages generating discriminative captions. It brings unique advantages: (1) the self-retrieval guidance can act as a metric and an evaluator of caption discriminativeness to assure the quality of generated captions. (2) The correspondence between generated captions and images are naturally incorporated in the generation process without human annotations, and hence our approach could utilize a large amount of unlabeled images to boost captioning performance with no additional annotations. We demonstrate the effectiveness of the proposed retrieval-guided method on COCO and Flickr30k captioning datasets, and show its superior captioning performance with more discriminative captions.
Persistent Identifier	http://hdl.handle.net/10722/316501
ISBN	9783030012663
ISSN	0302-9743 2020 SCImago Journal Rankings: 0.249
ISI Accession Number ID	WOS:000612999000021
Series/Report no.	Lecture Notes in Computer Science ; 11219 LNCS Sublibrary. SL 6, Image Processing, Computer Vision, Pattern Recognition, and Graphics

DC Field	Value	Language
dc.contributor.author	Liu, Xihui	-
dc.contributor.author	Li, Hongsheng	-
dc.contributor.author	Shao, Jing	-
dc.contributor.author	Chen, Dapeng	-
dc.contributor.author	Wang, Xiaogang	-
dc.date.accessioned	2022-09-14T11:40:37Z	-
dc.date.available	2022-09-14T11:40:37Z	-
dc.date.issued	2018	-
dc.identifier.citation	15th European Conference on Computer Vision (ECCV 2018), Munich, Germany, September 8-14 2018. In Ferrari, V, Hebert, M, Sminchisescu, C, et al. (Eds), Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV, p. 353-369. Cham, Switzerland: Springer, 2018	-
dc.identifier.isbn	9783030012663	-
dc.identifier.issn	0302-9743	-
dc.identifier.uri	http://hdl.handle.net/10722/316501	-
dc.description.abstract	The aim of image captioning is to generate captions by machine to describe image contents. Despite many efforts, generating discriminative captions for images remains non-trivial. Most traditional approaches imitate the language structure patterns, thus tend to fall into a stereotype of replicating frequent phrases or sentences and neglect unique aspects of each image. In this work, we propose an image captioning framework with a self-retrieval module as training guidance, which encourages generating discriminative captions. It brings unique advantages: (1) the self-retrieval guidance can act as a metric and an evaluator of caption discriminativeness to assure the quality of generated captions. (2) The correspondence between generated captions and images are naturally incorporated in the generation process without human annotations, and hence our approach could utilize a large amount of unlabeled images to boost captioning performance with no additional annotations. We demonstrate the effectiveness of the proposed retrieval-guided method on COCO and Flickr30k captioning datasets, and show its superior captioning performance with more discriminative captions.	-
dc.language	eng	-
dc.publisher	Springer	-
dc.relation.ispartof	Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV	-
dc.relation.ispartofseries	Lecture Notes in Computer Science ; 11219	-
dc.relation.ispartofseries	LNCS Sublibrary. SL 6, Image Processing, Computer Vision, Pattern Recognition, and Graphics	-
dc.subject	Image captioning	-
dc.subject	Language and vision	-
dc.subject	Text-image retrieval	-
dc.title	Show, tell and discriminate: Image captioning by self-retrieval with partially labeled data	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1007/978-3-030-01267-0_21	-
dc.identifier.scopus	eid_2-s2.0-85055422041	-
dc.identifier.spage	353	-
dc.identifier.epage	369	-
dc.identifier.eissn	1611-3349	-
dc.identifier.isi	WOS:000612999000021	-
dc.publisher.place	Cham, Switzerland	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Show, tell and discriminate: Image captioning by self-retrieval with partially labeled data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats