Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition

Liu, W; Chen, C; Wong, KKY

File Download

There are no files associated with this item.

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition

Title	Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition
Authors	Liu, W Chen, C Wong, KKY
Keywords	Text Recognition Attention Mechanism RNN
Issue Date	2018
Publisher	Association for the Advancement of Artificial Intelligence (AAAI) Press.
Citation	Proceedings of the Thirty-Second Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence (AAAI-18), New Orleans, Louisiana, USA, 2-7 February 2018, p. 7154-7161 How to Cite?
Abstract	In this paper, we present a Character-Aware Neural Network (Char-Net) for recognizing distorted scene text. Our Char-Net is composed of a word-level encoder, a character-level encoder, and a LSTM-based decoder. Unlike previous work which employed a global spatial transformer network to rectify the entire distorted text image, we take an approach of detecting and rectifying individual characters. To this end, we introduce a novel hierarchical attention mechanism (HAM) which consists of a recurrent RoIWarp layer and a character-level attention layer. The recurrent RoIWarp layer sequentially extracts a feature region corresponding to a character from the feature map produced by the word-level encoder, and feeds it to the character-level encoder which removes the distortion of the character through a simple spatial transformer and further encodes the character region. The character-level attention layer then attends to the most relevant features of the feature map produced by the character-level encoder and composes a context vector, which is finally fed to the LSTM-based decoder for decoding. This approach of adopting a simple local transformation to model the distortion of individual characters not only results in an improved efficiency, but can also handle different types of distortion that are hard, if not impossible, to be modelled by a single global transformation. Experiments have been conducted on six public benchmark datasets. Our results show that Char-Net can achieve state-of-the-art performance on all the benchmarks, especially on the IC-IST which contains scene text with large distortion. Code will be made available.
Description	Session: AAAI18 - Vision
Persistent Identifier	http://hdl.handle.net/10722/250580

DC Field	Value	Language
dc.contributor.author	Liu, W	-
dc.contributor.author	Chen, C	-
dc.contributor.author	Wong, KKY	-
dc.date.accessioned	2018-01-18T04:29:17Z	-
dc.date.available	2018-01-18T04:29:17Z	-
dc.date.issued	2018	-
dc.identifier.citation	Proceedings of the Thirty-Second Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence (AAAI-18), New Orleans, Louisiana, USA, 2-7 February 2018, p. 7154-7161	-
dc.identifier.uri	http://hdl.handle.net/10722/250580	-
dc.description	Session: AAAI18 - Vision	-
dc.description.abstract	In this paper, we present a Character-Aware Neural Network (Char-Net) for recognizing distorted scene text. Our Char-Net is composed of a word-level encoder, a character-level encoder, and a LSTM-based decoder. Unlike previous work which employed a global spatial transformer network to rectify the entire distorted text image, we take an approach of detecting and rectifying individual characters. To this end, we introduce a novel hierarchical attention mechanism (HAM) which consists of a recurrent RoIWarp layer and a character-level attention layer. The recurrent RoIWarp layer sequentially extracts a feature region corresponding to a character from the feature map produced by the word-level encoder, and feeds it to the character-level encoder which removes the distortion of the character through a simple spatial transformer and further encodes the character region. The character-level attention layer then attends to the most relevant features of the feature map produced by the character-level encoder and composes a context vector, which is finally fed to the LSTM-based decoder for decoding. This approach of adopting a simple local transformation to model the distortion of individual characters not only results in an improved efficiency, but can also handle different types of distortion that are hard, if not impossible, to be modelled by a single global transformation. Experiments have been conducted on six public benchmark datasets. Our results show that Char-Net can achieve state-of-the-art performance on all the benchmarks, especially on the IC-IST which contains scene text with large distortion. Code will be made available.	-
dc.language	eng	-
dc.publisher	Association for the Advancement of Artificial Intelligence (AAAI) Press.	-
dc.relation.ispartof	AAAI Conference on Artificial Intelligence, AAAI-18	-
dc.subject	Text Recognition	-
dc.subject	Attention Mechanism	-
dc.subject	RNN	-
dc.title	Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition	-
dc.type	Conference_Paper	-
dc.identifier.email	Wong, KKY: kykwong@cs.hku.hk	-
dc.identifier.authority	Wong, KKY=rp01393	-
dc.identifier.hkuros	284067	-
dc.identifier.spage	7154	-
dc.identifier.epage	7161	-
dc.publisher.place	United States	-

File Download

Supplementary

Conference Paper: Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats