File Download
Supplementary

postgraduate thesis: Nanopore long-read variant calling

TitleNanopore long-read variant calling
Authors
Issue Date2023
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zheng, Z. [鄭鎮賢]. (2023). Nanopore long-read variant calling. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractVariant calling is a crucial step in genomic analysis, enabling the identification of genetic variants and somatic mutations in sequenced DNA reads. These identified variants can provide insights into population genetics, disease progression, and cancer diagnosis. Oxford Nanopore Technology (ONT) is a third-generation sequencing platform with significantly improved sequencing throughput and flexibility. However, the high error rates inherent in Nanopore sequencing data pose challenges for accurate variant calling, necessitating the development of new variant callers. Clair3 is a novel framework designed for long-read germline variant calling that integrates both pileup and full-alignment models. This method addresses the limitations of Nanopore sequencing by detecting candidate variants using summarized pileup input and further resolving low-quality pileup candidates with more complete haplotype-resolved full-alignment representations. Clair3 proposes several strategies, including data-specific representation unification, network architecture optimization, and haplotype information aggregation, to enhance performance. Clair3 outperforms other variant callers, requires fewer computing resources, and is much faster than its competitors. Clair3 facilitates the discovery of variants in noisy Nanopore sequencing data and maintains high calling efficiency, demonstrating its capacity for comprehensive genome inferences and applications. ClairS is another framework specifically designed for somatic variant calling, primarily utilizing Nanopore sequencing reads. Unlike germline calling, somatic variant calling is more challenging due to sample contamination, sequencing artifacts, and low alternative allele frequency. ClairS introduces a synthetic approach combining different individuals to mimic tumor tissues, that allows for manual control of various synthetic proportions, coverages, and contaminations. Additionally, ClairS incorporates the phasing information to detect low-frequency somatic variants. Two modules, pileup, and full-alignment in ClairS, make their own decisions and are designed to cross-validate each other to filter massive false positives. Comprehensive analysis under various coverage, purity, and parameter settings demonstrates its robustness. ClairS addresses the absence of adequate real tumor materials, providing capacity for cancer research and medical diagnosis. The proposed frameworks in this study provide innovative solutions to the challenges associated with Nanopore variant calling. Moreover, this thesis introduces a universal variant calling framework that caters to various sequencing platforms and downstream analyses. The proposed framework highlights the significance of leveraging deep learning techniques and phased haplotype information to enhance long-read variant calling performance. Extensive experimentation demonstrates the versatility and adaptability of our method to diverse scenario settings. Overall, through systematic benchmarking and extensive user evaluations, I believe our frameworks could make a significant contribution to the genomics study field and pave the way for future research in comprehensive and accurate long-read variant calling.
DegreeDoctor of Philosophy
SubjectNanopores
Nucleotide sequence
Variation (Biology)
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/335141

 

DC FieldValueLanguage
dc.contributor.authorZheng, Zhenxian-
dc.contributor.author鄭鎮賢-
dc.date.accessioned2023-11-13T07:44:53Z-
dc.date.available2023-11-13T07:44:53Z-
dc.date.issued2023-
dc.identifier.citationZheng, Z. [鄭鎮賢]. (2023). Nanopore long-read variant calling. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/335141-
dc.description.abstractVariant calling is a crucial step in genomic analysis, enabling the identification of genetic variants and somatic mutations in sequenced DNA reads. These identified variants can provide insights into population genetics, disease progression, and cancer diagnosis. Oxford Nanopore Technology (ONT) is a third-generation sequencing platform with significantly improved sequencing throughput and flexibility. However, the high error rates inherent in Nanopore sequencing data pose challenges for accurate variant calling, necessitating the development of new variant callers. Clair3 is a novel framework designed for long-read germline variant calling that integrates both pileup and full-alignment models. This method addresses the limitations of Nanopore sequencing by detecting candidate variants using summarized pileup input and further resolving low-quality pileup candidates with more complete haplotype-resolved full-alignment representations. Clair3 proposes several strategies, including data-specific representation unification, network architecture optimization, and haplotype information aggregation, to enhance performance. Clair3 outperforms other variant callers, requires fewer computing resources, and is much faster than its competitors. Clair3 facilitates the discovery of variants in noisy Nanopore sequencing data and maintains high calling efficiency, demonstrating its capacity for comprehensive genome inferences and applications. ClairS is another framework specifically designed for somatic variant calling, primarily utilizing Nanopore sequencing reads. Unlike germline calling, somatic variant calling is more challenging due to sample contamination, sequencing artifacts, and low alternative allele frequency. ClairS introduces a synthetic approach combining different individuals to mimic tumor tissues, that allows for manual control of various synthetic proportions, coverages, and contaminations. Additionally, ClairS incorporates the phasing information to detect low-frequency somatic variants. Two modules, pileup, and full-alignment in ClairS, make their own decisions and are designed to cross-validate each other to filter massive false positives. Comprehensive analysis under various coverage, purity, and parameter settings demonstrates its robustness. ClairS addresses the absence of adequate real tumor materials, providing capacity for cancer research and medical diagnosis. The proposed frameworks in this study provide innovative solutions to the challenges associated with Nanopore variant calling. Moreover, this thesis introduces a universal variant calling framework that caters to various sequencing platforms and downstream analyses. The proposed framework highlights the significance of leveraging deep learning techniques and phased haplotype information to enhance long-read variant calling performance. Extensive experimentation demonstrates the versatility and adaptability of our method to diverse scenario settings. Overall, through systematic benchmarking and extensive user evaluations, I believe our frameworks could make a significant contribution to the genomics study field and pave the way for future research in comprehensive and accurate long-read variant calling.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshNanopores-
dc.subject.lcshNucleotide sequence-
dc.subject.lcshVariation (Biology)-
dc.titleNanopore long-read variant calling-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2023-
dc.identifier.mmsid991044736606903414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats