File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: A new tool for detecting short inversions using next generation sequencing (NGS) data and a systematic comparison of different NGS platforms on detection sensitivities

TitleA new tool for detecting short inversions using next generation sequencing (NGS) data and a systematic comparison of different NGS platforms on detection sensitivities
Authors
Issue Date2017
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Chen, R. [陈若言]. (2017). A new tool for detecting short inversions using next generation sequencing (NGS) data and a systematic comparison of different NGS platforms on detection sensitivities. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractRapid development of Next Generation Sequencing (NGS) technology has substantially transformed the landscape of biomedical research. As a result, analyses based on these sequencing technologies, especially methods and frameworks on variant detection, are becoming more diverse and powerful. These altogether provide enormous insights into population diversity and genetic diseases, and are moving rapidly the field of personalized medicine forward. Nevertheless, with such a wide variety of choices available on a variety of platforms and data types for re-sequencing of human genome and strategies on mutation identification, there are still detection gaps in the spectrum of variations for human genome, as well as twilight zones in the human genome that are untouched or overlooked at present. Among these dark sides, identification of short inversions and variants in repetitive regions are two remarkable yet neglected fields. Accordingly, in this thesis two parts of analysis are presented aiming at bringing insight into the twilight zones of NGS based analysis. The first analysis introduces a new framework, SRinversion, which is developed specifically for identification of inversions smaller than 1kb and is particularly suited for short reads of the NGS data at their present form. Summary of public databases on genomic variations clearly indicates that identification of inversions, especially those shorter than 100bp, was left behind comparing with that of other types of variations. Therefore, trying to fill up this detection gap of the full spectrum of variants, SRinversion applies an improved split reads method to examine unmapped and low-quality reads that are overlooked by most existing methods. Both simulated and real NGS data from the 1000 Genome Project were used to test the performance of SRinversion, as well as five published methods on the same data. The comparison shows that SRinversion achieves highest specificity and sensitivity on both data sets. And it is also the only algorithm that is able to detect inversions smaller than 50bp applying on real data. Besides, there are a fraction of genomic regions, such as those with repetitive sequences, that can hardly be covered by certain types of NGS data. Thus accordingly, in the second part, different types of sequencing data were compared to illustrate their advantages and shortcomings in variants calling. The comparison results provide some guidance to help researchers to choose sequencing plans that are most suitable for their projects and have more power in detection of various types of variants in complex regions in the meantime. This thesis focuses on two areas that are significant yet overlooked by existing studies on NGS data analysis. By introducing a new method on inversion detection and a comprehensive comparison of different NGS data types, results presented here should cast light on the twilight zones of NGS-based analyses and contribute to genetic researches making use of this new sequencing technology.
DegreeDoctor of Philosophy
SubjectMedical genetics
Nucleotide sequence
Dept/ProgramPaediatrics and Adolescent Medicine
Persistent Identifierhttp://hdl.handle.net/10722/250733

 

DC FieldValueLanguage
dc.contributor.authorChen, Ruoyan-
dc.contributor.author陈若言-
dc.date.accessioned2018-01-26T01:59:24Z-
dc.date.available2018-01-26T01:59:24Z-
dc.date.issued2017-
dc.identifier.citationChen, R. [陈若言]. (2017). A new tool for detecting short inversions using next generation sequencing (NGS) data and a systematic comparison of different NGS platforms on detection sensitivities. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/250733-
dc.description.abstractRapid development of Next Generation Sequencing (NGS) technology has substantially transformed the landscape of biomedical research. As a result, analyses based on these sequencing technologies, especially methods and frameworks on variant detection, are becoming more diverse and powerful. These altogether provide enormous insights into population diversity and genetic diseases, and are moving rapidly the field of personalized medicine forward. Nevertheless, with such a wide variety of choices available on a variety of platforms and data types for re-sequencing of human genome and strategies on mutation identification, there are still detection gaps in the spectrum of variations for human genome, as well as twilight zones in the human genome that are untouched or overlooked at present. Among these dark sides, identification of short inversions and variants in repetitive regions are two remarkable yet neglected fields. Accordingly, in this thesis two parts of analysis are presented aiming at bringing insight into the twilight zones of NGS based analysis. The first analysis introduces a new framework, SRinversion, which is developed specifically for identification of inversions smaller than 1kb and is particularly suited for short reads of the NGS data at their present form. Summary of public databases on genomic variations clearly indicates that identification of inversions, especially those shorter than 100bp, was left behind comparing with that of other types of variations. Therefore, trying to fill up this detection gap of the full spectrum of variants, SRinversion applies an improved split reads method to examine unmapped and low-quality reads that are overlooked by most existing methods. Both simulated and real NGS data from the 1000 Genome Project were used to test the performance of SRinversion, as well as five published methods on the same data. The comparison shows that SRinversion achieves highest specificity and sensitivity on both data sets. And it is also the only algorithm that is able to detect inversions smaller than 50bp applying on real data. Besides, there are a fraction of genomic regions, such as those with repetitive sequences, that can hardly be covered by certain types of NGS data. Thus accordingly, in the second part, different types of sequencing data were compared to illustrate their advantages and shortcomings in variants calling. The comparison results provide some guidance to help researchers to choose sequencing plans that are most suitable for their projects and have more power in detection of various types of variants in complex regions in the meantime. This thesis focuses on two areas that are significant yet overlooked by existing studies on NGS data analysis. By introducing a new method on inversion detection and a comprehensive comparison of different NGS data types, results presented here should cast light on the twilight zones of NGS-based analyses and contribute to genetic researches making use of this new sequencing technology. -
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshMedical genetics-
dc.subject.lcshNucleotide sequence-
dc.titleA new tool for detecting short inversions using next generation sequencing (NGS) data and a systematic comparison of different NGS platforms on detection sensitivities-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplinePaediatrics and Adolescent Medicine-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_991043982882403414-
dc.date.hkucongregation2017-
dc.identifier.mmsid991043982882403414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats