File Download
Supplementary

postgraduate thesis: A data management framework for clinical interpretation of human variations

TitleA data management framework for clinical interpretation of human variations
Authors
Issue Date2017
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Ou, M. [区敏]. (2017). A data management framework for clinical interpretation of human variations. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThe emergence of high-throughput, low-cost next-generation sequencing (NGS) technologies has led to an explosion in genetic information for clinical care. The exploitation of such massive genetic information has the potential to revolutionize disease diagnosis and drug development, but it also reveals an urgent need for efficient and accurate tools to analyze genetic information, in particular, to interpret genetic variants for clinical purposes. The challenge of NGS data management and analysis is not only in managing and analyzing the massive amount of data generated from genetic tests. Diverse sources (databases) of medical knowledge in annotations of genetic variants complicate the process of automating the variant analysis. For example, the coordinate system and naming convention vary from case to case. Integrating these annotations is an important, but enormous task, and the resulting databases require substantial storage space, and querying can be very slow without proper indexing and pre-processing. Another issue is that, in order to help users get a better understanding of genetic related annotations, visualization of different aspects of variant information needs to be handled carefully. Existing software tools have solved some of these problems, but lack other features. In this thesis, I present a data management framework for the clinical interpretation of human variations. First, it involves a unified coordinate system in which annotations are categorized according to variants, genes or proteins. Second, the annotation process can be speeded up by pre-processing the data on a supercomputer, and the integrated database storage can be reduced via a unified database representation with compressed fields. Based on this framework, an variant interpretation software tool called database.bio was designedand developed. It combines variant annotation, categorization, and visualization in order to support clinical doctors or bioinformaticians with insight into individual genetic characteristics. Moreover, the categorization rules and filter cascade function are included in database.bio to allow users to focus on a smaller volume of data, and a genome browser and seven specific tools are integrated to provide a better view of variant distributions, the nearby regions of the variant, the impact on the protein domain, and the pathways. (354 words)
DegreeMaster of Philosophy
SubjectNucleotide sequence - Data processing
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/241409
HKU Library Item IDb5864196

 

DC FieldValueLanguage
dc.contributor.authorOu, Min-
dc.contributor.author区敏-
dc.date.accessioned2017-06-13T02:07:46Z-
dc.date.available2017-06-13T02:07:46Z-
dc.date.issued2017-
dc.identifier.citationOu, M. [区敏]. (2017). A data management framework for clinical interpretation of human variations. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/241409-
dc.description.abstractThe emergence of high-throughput, low-cost next-generation sequencing (NGS) technologies has led to an explosion in genetic information for clinical care. The exploitation of such massive genetic information has the potential to revolutionize disease diagnosis and drug development, but it also reveals an urgent need for efficient and accurate tools to analyze genetic information, in particular, to interpret genetic variants for clinical purposes. The challenge of NGS data management and analysis is not only in managing and analyzing the massive amount of data generated from genetic tests. Diverse sources (databases) of medical knowledge in annotations of genetic variants complicate the process of automating the variant analysis. For example, the coordinate system and naming convention vary from case to case. Integrating these annotations is an important, but enormous task, and the resulting databases require substantial storage space, and querying can be very slow without proper indexing and pre-processing. Another issue is that, in order to help users get a better understanding of genetic related annotations, visualization of different aspects of variant information needs to be handled carefully. Existing software tools have solved some of these problems, but lack other features. In this thesis, I present a data management framework for the clinical interpretation of human variations. First, it involves a unified coordinate system in which annotations are categorized according to variants, genes or proteins. Second, the annotation process can be speeded up by pre-processing the data on a supercomputer, and the integrated database storage can be reduced via a unified database representation with compressed fields. Based on this framework, an variant interpretation software tool called database.bio was designedand developed. It combines variant annotation, categorization, and visualization in order to support clinical doctors or bioinformaticians with insight into individual genetic characteristics. Moreover, the categorization rules and filter cascade function are included in database.bio to allow users to focus on a smaller volume of data, and a genome browser and seven specific tools are integrated to provide a better view of variant distributions, the nearby regions of the variant, the impact on the protein domain, and the pathways. (354 words)-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.subject.lcshNucleotide sequence - Data processing-
dc.titleA data management framework for clinical interpretation of human variations-
dc.typePG_Thesis-
dc.identifier.hkulb5864196-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.mmsid991026391029703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats