File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: A data management framework for clinical interpretation of human variations
Title | A data management framework for clinical interpretation of human variations |
---|---|
Authors | |
Issue Date | 2017 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Ou, M. [区敏]. (2017). A data management framework for clinical interpretation of human variations. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | The emergence of high-throughput, low-cost next-generation sequencing (NGS)
technologies has led to an explosion in genetic information for clinical care.
The exploitation of such massive genetic information has the potential to
revolutionize disease diagnosis and drug development, but it also reveals an
urgent need for efficient and accurate tools to analyze genetic information, in
particular, to interpret genetic variants for clinical purposes.
The challenge of NGS data management and analysis is not only in managing and analyzing the massive amount of data generated from genetic tests.
Diverse sources (databases) of medical knowledge in annotations of genetic
variants complicate the process of automating the variant analysis. For example, the coordinate system and naming convention vary from case to case.
Integrating these annotations is an important, but enormous task, and the resulting databases require substantial storage space, and querying can be very
slow without proper indexing and pre-processing. Another issue is that, in
order to help users get a better understanding of genetic related annotations,
visualization of different aspects of variant information needs to be handled
carefully. Existing software tools have solved some of these problems, but lack
other features.
In this thesis, I present a data management framework for the clinical interpretation of human variations. First, it involves a unified coordinate system
in which annotations are categorized according to variants, genes or proteins.
Second, the annotation process can be speeded up by pre-processing the data
on a supercomputer, and the integrated database storage can be reduced via a
unified database representation with compressed fields. Based on this framework, an variant interpretation software tool called database.bio was designedand developed. It combines variant annotation, categorization, and visualization in order to support clinical doctors or bioinformaticians with insight
into individual genetic characteristics. Moreover, the categorization rules and
filter cascade function are included in database.bio to allow users to focus on
a smaller volume of data, and a genome browser and seven specific tools are
integrated to provide a better view of variant distributions, the nearby regions
of the variant, the impact on the protein domain, and the pathways.
(354 words) |
Degree | Master of Philosophy |
Subject | Nucleotide sequence - Data processing |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/241409 |
HKU Library Item ID | b5864196 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Ou, Min | - |
dc.contributor.author | 区敏 | - |
dc.date.accessioned | 2017-06-13T02:07:46Z | - |
dc.date.available | 2017-06-13T02:07:46Z | - |
dc.date.issued | 2017 | - |
dc.identifier.citation | Ou, M. [区敏]. (2017). A data management framework for clinical interpretation of human variations. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/241409 | - |
dc.description.abstract | The emergence of high-throughput, low-cost next-generation sequencing (NGS) technologies has led to an explosion in genetic information for clinical care. The exploitation of such massive genetic information has the potential to revolutionize disease diagnosis and drug development, but it also reveals an urgent need for efficient and accurate tools to analyze genetic information, in particular, to interpret genetic variants for clinical purposes. The challenge of NGS data management and analysis is not only in managing and analyzing the massive amount of data generated from genetic tests. Diverse sources (databases) of medical knowledge in annotations of genetic variants complicate the process of automating the variant analysis. For example, the coordinate system and naming convention vary from case to case. Integrating these annotations is an important, but enormous task, and the resulting databases require substantial storage space, and querying can be very slow without proper indexing and pre-processing. Another issue is that, in order to help users get a better understanding of genetic related annotations, visualization of different aspects of variant information needs to be handled carefully. Existing software tools have solved some of these problems, but lack other features. In this thesis, I present a data management framework for the clinical interpretation of human variations. First, it involves a unified coordinate system in which annotations are categorized according to variants, genes or proteins. Second, the annotation process can be speeded up by pre-processing the data on a supercomputer, and the integrated database storage can be reduced via a unified database representation with compressed fields. Based on this framework, an variant interpretation software tool called database.bio was designedand developed. It combines variant annotation, categorization, and visualization in order to support clinical doctors or bioinformaticians with insight into individual genetic characteristics. Moreover, the categorization rules and filter cascade function are included in database.bio to allow users to focus on a smaller volume of data, and a genome browser and seven specific tools are integrated to provide a better view of variant distributions, the nearby regions of the variant, the impact on the protein domain, and the pathways. (354 words) | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.subject.lcsh | Nucleotide sequence - Data processing | - |
dc.title | A data management framework for clinical interpretation of human variations | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b5864196 | - |
dc.description.thesisname | Master of Philosophy | - |
dc.description.thesislevel | Master | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.mmsid | 991026391029703414 | - |