Learning general-purpose neural architectures for machine vision

Ding, Mingyu; 丁明宇

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Learning general-purpose neural architectures for machine vision

Title	Learning general-purpose neural architectures for machine vision
Authors	Ding, Mingyu 丁明宇
Advisors	Advisor(s):Luo, P Wong, KKY
Issue Date	2022
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Ding, M. [丁明宇]. (2022). Learning general-purpose neural architectures for machine vision. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Humans can capture the correlation of multiple tasks to better handle multiple tasks simultaneously or adapt to new tasks even modalities. For example, we can easily perform simultaneous localization and recognition, as well as transferring knowledge from images to videos. However, when building a complex machine system with multiple tasks, one may design a model for each task manually, which is redundant and costly. A state-of-the-art perception algorithm requires a customized dataset and pipeline to recognize domain-specific patterns, which makes it difficult to generalize to new scenarios. Designing a single neural network architecture that is able to adapt to multiple different tasks is challenging in computer vision. In this dissertation, we study the problem of multi-task neural architecture design -- building versatile, efficient, and generalizable algorithms to automatically design models that can work on multiple tasks or transfer between different tasks. We tackles many vision problems in multi-task architecture design under a single viewpoint, including task correlation modelling, unified architecture space designing, and versatile architecture searching. We divide the dissertation into two parts. For the first part, we show the importance of multi-task model designing and training by two sets of mutually beneficial visual perception tasks. For the second part, we simultaneously tackle the fundamentals of task correlation modelling, architecture design space unifying, and multi-task architecture searching algorithms to achieve versatile and generalizable models. For Part 1, unlike previous methods that typically focus on one task, we combine two tasks seamlessly to show their mutual benefit. In Chapter 2, we use the representation of depth estimation to guide the learning of the 3D object detection task. In Chapter 3, we show the network architectures and features of the optical flow estimation and semantic segmentation could be shared through joint learning. Although the two frameworks show great success and significantly outperforms the model trained on a single task, the correlation of the two tasks and the training pipeline are manually designed. This is because different tasks have different data distributions and require different granularity of feature representations. For Part 2, we solve the above challenge by designing a unified network space for various vision tasks, and customizing and transferring network architectures between different tasks and their combinations in the network coding space. In Chapter 4, we introduce a unified design space for multiple tasks and build a multitask NAS benchmark (NAS-Bench-MR) on many widely used datasets. We then propose to back-propagate gradients of neural predictors to directly update architecture codes along the desired gradient directions to solve various tasks. In Chapter 5, we make tradeoffs between different granularity of feature representations for each task in a top-down manner. In Chapter 6, we build a dependency-inspired architecture that can naturally induce visual dependencies and build dependency trees for comprehensive visual understanding in a bottom-up manner. The works jointly address the fundamentals of multi-task model design from three different perspectives.
Degree	Doctor of Philosophy
Subject	Neural networks (Computer science) Computer vision
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/323712

DC Field	Value	Language
dc.contributor.advisor	Luo, P	-
dc.contributor.advisor	Wong, KKY	-
dc.contributor.author	Ding, Mingyu	-
dc.contributor.author	丁明宇	-
dc.date.accessioned	2023-01-09T01:48:41Z	-
dc.date.available	2023-01-09T01:48:41Z	-
dc.date.issued	2022	-
dc.identifier.citation	Ding, M. [丁明宇]. (2022). Learning general-purpose neural architectures for machine vision. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/323712	-
dc.description.abstract	Humans can capture the correlation of multiple tasks to better handle multiple tasks simultaneously or adapt to new tasks even modalities. For example, we can easily perform simultaneous localization and recognition, as well as transferring knowledge from images to videos. However, when building a complex machine system with multiple tasks, one may design a model for each task manually, which is redundant and costly. A state-of-the-art perception algorithm requires a customized dataset and pipeline to recognize domain-specific patterns, which makes it difficult to generalize to new scenarios. Designing a single neural network architecture that is able to adapt to multiple different tasks is challenging in computer vision. In this dissertation, we study the problem of multi-task neural architecture design -- building versatile, efficient, and generalizable algorithms to automatically design models that can work on multiple tasks or transfer between different tasks. We tackles many vision problems in multi-task architecture design under a single viewpoint, including task correlation modelling, unified architecture space designing, and versatile architecture searching. We divide the dissertation into two parts. For the first part, we show the importance of multi-task model designing and training by two sets of mutually beneficial visual perception tasks. For the second part, we simultaneously tackle the fundamentals of task correlation modelling, architecture design space unifying, and multi-task architecture searching algorithms to achieve versatile and generalizable models. For Part 1, unlike previous methods that typically focus on one task, we combine two tasks seamlessly to show their mutual benefit. In Chapter 2, we use the representation of depth estimation to guide the learning of the 3D object detection task. In Chapter 3, we show the network architectures and features of the optical flow estimation and semantic segmentation could be shared through joint learning. Although the two frameworks show great success and significantly outperforms the model trained on a single task, the correlation of the two tasks and the training pipeline are manually designed. This is because different tasks have different data distributions and require different granularity of feature representations. For Part 2, we solve the above challenge by designing a unified network space for various vision tasks, and customizing and transferring network architectures between different tasks and their combinations in the network coding space. In Chapter 4, we introduce a unified design space for multiple tasks and build a multitask NAS benchmark (NAS-Bench-MR) on many widely used datasets. We then propose to back-propagate gradients of neural predictors to directly update architecture codes along the desired gradient directions to solve various tasks. In Chapter 5, we make tradeoffs between different granularity of feature representations for each task in a top-down manner. In Chapter 6, we build a dependency-inspired architecture that can naturally induce visual dependencies and build dependency trees for comprehensive visual understanding in a bottom-up manner. The works jointly address the fundamentals of multi-task model design from three different perspectives.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Neural networks (Computer science)	-
dc.subject.lcsh	Computer vision	-
dc.title	Learning general-purpose neural architectures for machine vision	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2023	-
dc.identifier.mmsid	991044625592303414	-

File Download

Supplementary

postgraduate thesis: Learning general-purpose neural architectures for machine vision

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats