File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Nonlocal feature learning for image segmentation, facial video hallucination, and volumetric segmentation

TitleNonlocal feature learning for image segmentation, facial video hallucination, and volumetric segmentation
Authors
Advisors
Advisor(s):Yu, Y
Issue Date2019
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Fang, C. [方超伟]. (2019). Nonlocal feature learning for image segmentation, facial video hallucination, and volumetric segmentation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractNonlocal spatial/temporal feature learning is vital to image segmentation, facial video hallucination and volumetric segmentation. In image/volumetric segmentation, it is computationally efficient to extract local features for every pixel/voxel. However these local features lack of nonlocal context information which is paramount to strengthen their representative capability and improve segmentation performance. In the context of facial video, the strong interframe coherence can be excavated through learning nonlocal temporal features. We propose novel algorithms to exploit nonlocal contextual features in image segmentation, facial video hallucination and volumetric segmentation respectively. Firstly for image segmentation, a new multi-dimensional nonlinear embedding, named \emph{Piecewise Flat Embedding} is proposed to learn pixel-wise nonlocal features. Based on the theory of sparse signal recovery, piecewise flat embedding attempts to recover a piecewise constant image representation with sparse region boundaries and sparse cluster value scattering. The resultant piecewise flat embedding exhibits interesting properties such as suppressing slowly varying signals, and offers an image representation with higher region identifiability which is desirable for image segmentation or high-level semantic analysis tasks. We formulate our embedding as a variant of the Laplacian Eigenmap embedding with an $L_{1,p}\; (0
DegreeDoctor of Philosophy
SubjectImage segmentation
Computer vision
Image processing
Imaging systems in medicine
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/279774

 

DC FieldValueLanguage
dc.contributor.advisorYu, Y-
dc.contributor.authorFang, Chaowei-
dc.contributor.author方超伟-
dc.date.accessioned2019-12-10T10:04:50Z-
dc.date.available2019-12-10T10:04:50Z-
dc.date.issued2019-
dc.identifier.citationFang, C. [方超伟]. (2019). Nonlocal feature learning for image segmentation, facial video hallucination, and volumetric segmentation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/279774-
dc.description.abstractNonlocal spatial/temporal feature learning is vital to image segmentation, facial video hallucination and volumetric segmentation. In image/volumetric segmentation, it is computationally efficient to extract local features for every pixel/voxel. However these local features lack of nonlocal context information which is paramount to strengthen their representative capability and improve segmentation performance. In the context of facial video, the strong interframe coherence can be excavated through learning nonlocal temporal features. We propose novel algorithms to exploit nonlocal contextual features in image segmentation, facial video hallucination and volumetric segmentation respectively. Firstly for image segmentation, a new multi-dimensional nonlinear embedding, named \emph{Piecewise Flat Embedding} is proposed to learn pixel-wise nonlocal features. Based on the theory of sparse signal recovery, piecewise flat embedding attempts to recover a piecewise constant image representation with sparse region boundaries and sparse cluster value scattering. The resultant piecewise flat embedding exhibits interesting properties such as suppressing slowly varying signals, and offers an image representation with higher region identifiability which is desirable for image segmentation or high-level semantic analysis tasks. We formulate our embedding as a variant of the Laplacian Eigenmap embedding with an $L_{1,p}\; (0<p\leq1)$ regularization term to promote sparse solutions. First, we devise a two-stage numerical algorithm based on Bregman iterations to compute $L_{1,1}$-regularized piecewise flat embeddings. We further generalize this algorithm through iterative reweighting to solve the general $L_{1,p}$-regularized problem. To demonstrate its efficacy, we integrate PFE into two existing image segmentation frameworks, segmentation based on clustering and hierarchical segmentation based on contour detection. Experiments on four benchmark image datasets, show that segmentation algorithms incorporating our embedding channels achieve significantly improved results. Taking advantage of nonlocal inter-frame dependency in facial videos, we propose a \emph{Self-Enhanced Convolutional Network} for facial video hallucination. It is implemented by making full usage of preceding super-resolved frames and a temporal window of adjacent low-resolution frames. Specifically, the algorithm first obtains the initial high-resolution inference of each frame by taking into consideration a sequence of consecutive low-resolution inputs through temporal consistency modelling. It further recurrently exploits the reconstructed results and intermediate features of a sequence of preceding frames to improve the initial super-resolution of the current frame by modelling the coherence of structural facial features across frames. Quantitative and qualitative evaluations demonstrate the superiority of the proposed algorithm against state-of-the-art methods. Moreover, our algorithm also achieves excellent performance in the task of general video super-resolution in a single-shot setting. Aiming to address the pancreas segmentation task in 3D computed tomography volumes, we propose a novel end-to-end network, \emph{Globally Guided Progressive Fusion Network}, as an effective and efficient solution to volumetric segmentation, which involves both global features and complicated 3D geometric information. A progressive fusion network is devised to extract 3D information from a moderate number of neighbouring slices and predict a probability map for the segmentation of each slice. An independent branch for excavating global features from downsampled slices is further integrated into the network. Extensive experimental results demonstrate that our method achieves state-of-the-art performance on two pancreas datasets.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshImage segmentation-
dc.subject.lcshComputer vision-
dc.subject.lcshImage processing-
dc.subject.lcshImaging systems in medicine-
dc.titleNonlocal feature learning for image segmentation, facial video hallucination, and volumetric segmentation-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_991044168862703414-
dc.date.hkucongregation2019-
dc.identifier.mmsid991044168862703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats