File Download
Supplementary

postgraduate thesis: Robust text recognition in natural images

TitleRobust text recognition in natural images
Authors
Advisors
Advisor(s):Wong, KKY
Issue Date2019
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Liu, W. [劉偉]. (2019). Robust text recognition in natural images. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThis thesis addresses the problem of scene text recognition, which refers to recognising words that appear in various kinds of natural images. It has received much attention as many real world applications can benefit from the rich semantic information embedded in natural text images. However, recognising text in natural images is not a trivial task due to many challenges. In this thesis, to achieve the goal of robust text recognition, we mainly focus on handling two of those challenges: having words with geometrical distortions and characters with varying scales in natural text images. In the first chapter of this thesis, we present a novel SpaTial Attention Residue Network (STAR-Net) for recognising distorted scene text. To handle geometrical distortions of text images from the whole word perspective, our STAR-Net takes advantage of a global spatial transformer network, which can automatically locate and transform the entire distorted word region into an undistorted one. Residue convolutional blocks are also exploited in our STAR-Net to build a very deep feature encoder for extracting discriminative features from the rectified word region. Experimental results demonstrate our STAR-Net can successfully recognise distorted text in natural images and achieve better performance than previous methods on several public benchmarks. Instead of focusing on the distortion of the entire word, this thesis then presents a character aware neural network (Char-Net), which tackles the distortion problem by detecting and rectifying individual characters in distorted text images. In order to recurrently attend on each character region in the text image, we employ a novel recurrent RoIwarp layer in our Char-Net. A simple spatial transformer network then takes the attended character region as the input and removes its local distortion. This approach of using a simple local transformation to remove the distortions of individual characters not only results in an improved efficiency, but can also handle different types of distortions that are hard, if not impossible, to be modelled by a single global transformation. In the third part of this thesis, we address the scale problem for scene text recognition. In order to extract scale invariant features from characters with different scales, we specifically design a novel scale aware feature encoder. Compared with the traditional single-CNN encoder, our scale aware feature encoder explicitly handles the scale problem, which enables the recogniser put more effort in handling other challenges. Besides, our proposed encoder can transfer the learning of feature encoding across different character scales. This is particularly important when the training dataset has a very unbalanced distribution of character scales, as training with such a dataset makes the encoder biased towards extracting features from the predominant scale. Finally, we present a scale aware Char-Net that combines the scale aware feature encoder with our Char-Net to simultaneously handle characters with varying scales and words with severe distortions in natural images.
DegreeDoctor of Philosophy
SubjectPattern recognition systems
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/273755

 

DC FieldValueLanguage
dc.contributor.advisorWong, KKY-
dc.contributor.authorLiu, Wei-
dc.contributor.author劉偉-
dc.date.accessioned2019-08-14T03:29:46Z-
dc.date.available2019-08-14T03:29:46Z-
dc.date.issued2019-
dc.identifier.citationLiu, W. [劉偉]. (2019). Robust text recognition in natural images. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/273755-
dc.description.abstractThis thesis addresses the problem of scene text recognition, which refers to recognising words that appear in various kinds of natural images. It has received much attention as many real world applications can benefit from the rich semantic information embedded in natural text images. However, recognising text in natural images is not a trivial task due to many challenges. In this thesis, to achieve the goal of robust text recognition, we mainly focus on handling two of those challenges: having words with geometrical distortions and characters with varying scales in natural text images. In the first chapter of this thesis, we present a novel SpaTial Attention Residue Network (STAR-Net) for recognising distorted scene text. To handle geometrical distortions of text images from the whole word perspective, our STAR-Net takes advantage of a global spatial transformer network, which can automatically locate and transform the entire distorted word region into an undistorted one. Residue convolutional blocks are also exploited in our STAR-Net to build a very deep feature encoder for extracting discriminative features from the rectified word region. Experimental results demonstrate our STAR-Net can successfully recognise distorted text in natural images and achieve better performance than previous methods on several public benchmarks. Instead of focusing on the distortion of the entire word, this thesis then presents a character aware neural network (Char-Net), which tackles the distortion problem by detecting and rectifying individual characters in distorted text images. In order to recurrently attend on each character region in the text image, we employ a novel recurrent RoIwarp layer in our Char-Net. A simple spatial transformer network then takes the attended character region as the input and removes its local distortion. This approach of using a simple local transformation to remove the distortions of individual characters not only results in an improved efficiency, but can also handle different types of distortions that are hard, if not impossible, to be modelled by a single global transformation. In the third part of this thesis, we address the scale problem for scene text recognition. In order to extract scale invariant features from characters with different scales, we specifically design a novel scale aware feature encoder. Compared with the traditional single-CNN encoder, our scale aware feature encoder explicitly handles the scale problem, which enables the recogniser put more effort in handling other challenges. Besides, our proposed encoder can transfer the learning of feature encoding across different character scales. This is particularly important when the training dataset has a very unbalanced distribution of character scales, as training with such a dataset makes the encoder biased towards extracting features from the predominant scale. Finally, we present a scale aware Char-Net that combines the scale aware feature encoder with our Char-Net to simultaneously handle characters with varying scales and words with severe distortions in natural images.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshPattern recognition systems-
dc.titleRobust text recognition in natural images-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2019-
dc.identifier.mmsid991044128172303414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats