File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: A new tool for detecting short inversions using next generation sequencing (NGS) data and a systematic comparison of different NGS platforms on detection sensitivities
Title | A new tool for detecting short inversions using next generation sequencing (NGS) data and a systematic comparison of different NGS platforms on detection sensitivities |
---|---|
Authors | |
Issue Date | 2017 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Chen, R. [陈若言]. (2017). A new tool for detecting short inversions using next generation sequencing (NGS) data and a systematic comparison of different NGS platforms on detection sensitivities. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Rapid development of Next Generation Sequencing (NGS) technology has substantially transformed the landscape of biomedical research. As a result, analyses based on these sequencing technologies, especially methods and frameworks on variant detection, are becoming more diverse and powerful. These altogether provide enormous insights into population diversity and genetic diseases, and are moving rapidly the field of personalized medicine forward.
Nevertheless, with such a wide variety of choices available on a variety of platforms and data types for re-sequencing of human genome and strategies on mutation identification, there are still detection gaps in the spectrum of variations for human genome, as well as twilight zones in the human genome that are untouched or overlooked at present. Among these dark sides, identification of short inversions and variants in repetitive regions are two remarkable yet neglected fields. Accordingly, in this thesis two parts of analysis are presented aiming at bringing insight into the twilight zones of NGS based analysis.
The first analysis introduces a new framework, SRinversion, which is developed specifically for identification of inversions smaller than 1kb and is particularly suited for short reads of the NGS data at their present form. Summary of public databases on genomic variations clearly indicates that identification of inversions, especially those shorter than 100bp, was left behind comparing with that of other types of variations. Therefore, trying to fill up this detection gap of the full spectrum of variants, SRinversion applies an improved split reads method to examine unmapped and low-quality reads that are overlooked by most existing methods. Both simulated and real NGS data from the 1000 Genome Project were used to test the performance of SRinversion, as well as five published methods on the same data. The comparison shows that SRinversion achieves highest specificity and sensitivity on both data sets. And it is also the only algorithm that is able to detect inversions smaller than 50bp applying on real data.
Besides, there are a fraction of genomic regions, such as those with repetitive sequences, that can hardly be covered by certain types of NGS data. Thus accordingly, in the second part, different types of sequencing data were compared to illustrate their advantages and shortcomings in variants calling. The comparison results provide some guidance to help researchers to choose sequencing plans that are most suitable for their projects and have more power in detection of various types of variants in complex regions in the meantime.
This thesis focuses on two areas that are significant yet overlooked by existing studies on NGS data analysis. By introducing a new method on inversion detection and a comprehensive comparison of different NGS data types, results presented here should cast light on the twilight zones of NGS-based analyses and contribute to genetic researches making use of this new sequencing technology.
|
Degree | Doctor of Philosophy |
Subject | Medical genetics Nucleotide sequence |
Dept/Program | Paediatrics and Adolescent Medicine |
Persistent Identifier | http://hdl.handle.net/10722/250733 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Chen, Ruoyan | - |
dc.contributor.author | 陈若言 | - |
dc.date.accessioned | 2018-01-26T01:59:24Z | - |
dc.date.available | 2018-01-26T01:59:24Z | - |
dc.date.issued | 2017 | - |
dc.identifier.citation | Chen, R. [陈若言]. (2017). A new tool for detecting short inversions using next generation sequencing (NGS) data and a systematic comparison of different NGS platforms on detection sensitivities. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/250733 | - |
dc.description.abstract | Rapid development of Next Generation Sequencing (NGS) technology has substantially transformed the landscape of biomedical research. As a result, analyses based on these sequencing technologies, especially methods and frameworks on variant detection, are becoming more diverse and powerful. These altogether provide enormous insights into population diversity and genetic diseases, and are moving rapidly the field of personalized medicine forward. Nevertheless, with such a wide variety of choices available on a variety of platforms and data types for re-sequencing of human genome and strategies on mutation identification, there are still detection gaps in the spectrum of variations for human genome, as well as twilight zones in the human genome that are untouched or overlooked at present. Among these dark sides, identification of short inversions and variants in repetitive regions are two remarkable yet neglected fields. Accordingly, in this thesis two parts of analysis are presented aiming at bringing insight into the twilight zones of NGS based analysis. The first analysis introduces a new framework, SRinversion, which is developed specifically for identification of inversions smaller than 1kb and is particularly suited for short reads of the NGS data at their present form. Summary of public databases on genomic variations clearly indicates that identification of inversions, especially those shorter than 100bp, was left behind comparing with that of other types of variations. Therefore, trying to fill up this detection gap of the full spectrum of variants, SRinversion applies an improved split reads method to examine unmapped and low-quality reads that are overlooked by most existing methods. Both simulated and real NGS data from the 1000 Genome Project were used to test the performance of SRinversion, as well as five published methods on the same data. The comparison shows that SRinversion achieves highest specificity and sensitivity on both data sets. And it is also the only algorithm that is able to detect inversions smaller than 50bp applying on real data. Besides, there are a fraction of genomic regions, such as those with repetitive sequences, that can hardly be covered by certain types of NGS data. Thus accordingly, in the second part, different types of sequencing data were compared to illustrate their advantages and shortcomings in variants calling. The comparison results provide some guidance to help researchers to choose sequencing plans that are most suitable for their projects and have more power in detection of various types of variants in complex regions in the meantime. This thesis focuses on two areas that are significant yet overlooked by existing studies on NGS data analysis. By introducing a new method on inversion detection and a comprehensive comparison of different NGS data types, results presented here should cast light on the twilight zones of NGS-based analyses and contribute to genetic researches making use of this new sequencing technology. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Medical genetics | - |
dc.subject.lcsh | Nucleotide sequence | - |
dc.title | A new tool for detecting short inversions using next generation sequencing (NGS) data and a systematic comparison of different NGS platforms on detection sensitivities | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Paediatrics and Adolescent Medicine | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_991043982882403414 | - |
dc.date.hkucongregation | 2017 | - |
dc.identifier.mmsid | 991043982882403414 | - |