A data management framework for clinical interpretation of human variations

Ou, Min; 区敏

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: A data management framework for clinical interpretation of human variations

Title	A data management framework for clinical interpretation of human variations
Authors	Ou, Min 区敏
Issue Date	2017
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Ou, M. [区敏]. (2017). A data management framework for clinical interpretation of human variations. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	The emergence of high-throughput, low-cost next-generation sequencing (NGS) technologies has led to an explosion in genetic information for clinical care. The exploitation of such massive genetic information has the potential to revolutionize disease diagnosis and drug development, but it also reveals an urgent need for efficient and accurate tools to analyze genetic information, in particular, to interpret genetic variants for clinical purposes. The challenge of NGS data management and analysis is not only in managing and analyzing the massive amount of data generated from genetic tests. Diverse sources (databases) of medical knowledge in annotations of genetic variants complicate the process of automating the variant analysis. For example, the coordinate system and naming convention vary from case to case. Integrating these annotations is an important, but enormous task, and the resulting databases require substantial storage space, and querying can be very slow without proper indexing and pre-processing. Another issue is that, in order to help users get a better understanding of genetic related annotations, visualization of different aspects of variant information needs to be handled carefully. Existing software tools have solved some of these problems, but lack other features. In this thesis, I present a data management framework for the clinical interpretation of human variations. First, it involves a unified coordinate system in which annotations are categorized according to variants, genes or proteins. Second, the annotation process can be speeded up by pre-processing the data on a supercomputer, and the integrated database storage can be reduced via a unified database representation with compressed fields. Based on this framework, an variant interpretation software tool called database.bio was designedand developed. It combines variant annotation, categorization, and visualization in order to support clinical doctors or bioinformaticians with insight into individual genetic characteristics. Moreover, the categorization rules and filter cascade function are included in database.bio to allow users to focus on a smaller volume of data, and a genome browser and seven specific tools are integrated to provide a better view of variant distributions, the nearby regions of the variant, the impact on the protein domain, and the pathways. (354 words)
Degree	Master of Philosophy
Subject	Nucleotide sequence - Data processing
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/241409
HKU Library Item ID	b5864196

DC Field	Value	Language
dc.contributor.author	Ou, Min	-
dc.contributor.author	区敏	-
dc.date.accessioned	2017-06-13T02:07:46Z	-
dc.date.available	2017-06-13T02:07:46Z	-
dc.date.issued	2017	-
dc.identifier.citation	Ou, M. [区敏]. (2017). A data management framework for clinical interpretation of human variations. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/241409	-
dc.description.abstract	The emergence of high-throughput, low-cost next-generation sequencing (NGS) technologies has led to an explosion in genetic information for clinical care. The exploitation of such massive genetic information has the potential to revolutionize disease diagnosis and drug development, but it also reveals an urgent need for efficient and accurate tools to analyze genetic information, in particular, to interpret genetic variants for clinical purposes. The challenge of NGS data management and analysis is not only in managing and analyzing the massive amount of data generated from genetic tests. Diverse sources (databases) of medical knowledge in annotations of genetic variants complicate the process of automating the variant analysis. For example, the coordinate system and naming convention vary from case to case. Integrating these annotations is an important, but enormous task, and the resulting databases require substantial storage space, and querying can be very slow without proper indexing and pre-processing. Another issue is that, in order to help users get a better understanding of genetic related annotations, visualization of different aspects of variant information needs to be handled carefully. Existing software tools have solved some of these problems, but lack other features. In this thesis, I present a data management framework for the clinical interpretation of human variations. First, it involves a unified coordinate system in which annotations are categorized according to variants, genes or proteins. Second, the annotation process can be speeded up by pre-processing the data on a supercomputer, and the integrated database storage can be reduced via a unified database representation with compressed fields. Based on this framework, an variant interpretation software tool called database.bio was designedand developed. It combines variant annotation, categorization, and visualization in order to support clinical doctors or bioinformaticians with insight into individual genetic characteristics. Moreover, the categorization rules and filter cascade function are included in database.bio to allow users to focus on a smaller volume of data, and a genome browser and seven specific tools are integrated to provide a better view of variant distributions, the nearby regions of the variant, the impact on the protein domain, and the pathways. (354 words)	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.subject.lcsh	Nucleotide sequence - Data processing	-
dc.title	A data management framework for clinical interpretation of human variations	-
dc.type	PG_Thesis	-
dc.identifier.hkul	b5864196	-
dc.description.thesisname	Master of Philosophy	-
dc.description.thesislevel	Master	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.identifier.mmsid	991026391029703414	-

File Download

Supplementary

postgraduate thesis: A data management framework for clinical interpretation of human variations

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats