File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Accurate genome consensus and misassembly detection in assembling nanopore sequencing data via deep learning
Title | Accurate genome consensus and misassembly detection in assembling nanopore sequencing data via deep learning |
---|---|
Authors | |
Advisors | Advisor(s):Lam, TW |
Issue Date | 2022 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Zhang, Y. [張亦凡]. (2022). Accurate genome consensus and misassembly detection in assembling nanopore sequencing data via deep learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Single-molecule sequencing technologies produce much longer reads compared with next-generation sequencing, greatly improving the contiguity of de novo assembly of genomes. However, the relatively high error rates in long reads make it challenging to obtain high-quality assemblies. Meanwhile, thanks to the rapid development of deep learning technologies, more and more previously impractical or computational-heavy tasks have been made possible or easier. Despite the great success of deep learning in numerous fields (e.g., image recognition), the application of deep learning algorithms in bioinformatics, especially in genome assembly, is still scarce. This thesis presents two deep learning-based tools, aiming to improve the quality of genome assemblies from a micro and macro perspective, respectively. CONNET is an accurate genome consensus tool. Genome consensus, which is essential to correct a draft assembly by resolving the discrepancies in the reads, is computationally intensive. In recent years, efficient consensus tools have emerged based on partial-order alignment. We discovered that the spatial relationship of alignment pileup, which could be utilized by deep learning, is crucial to high-quality consensus. CONNET showed the highest accuracy of any existing method. In addition to achieving high-quality consensus results, CONNET is capable of delivering phased diploid genome consensus. M-NET is the first reference-free misassembly detector for Nanopore sequencing data. Misassemblies are usually assessed with the help of a reference genome, which is not available during de novo assembly. M-NET predicts the presence of misassemblies solely based on the alignment pileup of raw reads to the assembly. |
Degree | Master of Philosophy |
Subject | Genomics Nanopores Deep learning (Machine learning) |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/322957 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Lam, TW | - |
dc.contributor.author | Zhang, Yifan | - |
dc.contributor.author | 張亦凡 | - |
dc.date.accessioned | 2022-11-18T10:42:08Z | - |
dc.date.available | 2022-11-18T10:42:08Z | - |
dc.date.issued | 2022 | - |
dc.identifier.citation | Zhang, Y. [張亦凡]. (2022). Accurate genome consensus and misassembly detection in assembling nanopore sequencing data via deep learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/322957 | - |
dc.description.abstract | Single-molecule sequencing technologies produce much longer reads compared with next-generation sequencing, greatly improving the contiguity of de novo assembly of genomes. However, the relatively high error rates in long reads make it challenging to obtain high-quality assemblies. Meanwhile, thanks to the rapid development of deep learning technologies, more and more previously impractical or computational-heavy tasks have been made possible or easier. Despite the great success of deep learning in numerous fields (e.g., image recognition), the application of deep learning algorithms in bioinformatics, especially in genome assembly, is still scarce. This thesis presents two deep learning-based tools, aiming to improve the quality of genome assemblies from a micro and macro perspective, respectively. CONNET is an accurate genome consensus tool. Genome consensus, which is essential to correct a draft assembly by resolving the discrepancies in the reads, is computationally intensive. In recent years, efficient consensus tools have emerged based on partial-order alignment. We discovered that the spatial relationship of alignment pileup, which could be utilized by deep learning, is crucial to high-quality consensus. CONNET showed the highest accuracy of any existing method. In addition to achieving high-quality consensus results, CONNET is capable of delivering phased diploid genome consensus. M-NET is the first reference-free misassembly detector for Nanopore sequencing data. Misassemblies are usually assessed with the help of a reference genome, which is not available during de novo assembly. M-NET predicts the presence of misassemblies solely based on the alignment pileup of raw reads to the assembly. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Genomics | - |
dc.subject.lcsh | Nanopores | - |
dc.subject.lcsh | Deep learning (Machine learning) | - |
dc.title | Accurate genome consensus and misassembly detection in assembling nanopore sequencing data via deep learning | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Master of Philosophy | - |
dc.description.thesislevel | Master | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2022 | - |
dc.identifier.mmsid | 991044609096803414 | - |