File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Deep computational analysis of metagenomic data in taxonomic and functional dimensions

TitleDeep computational analysis of metagenomic data in taxonomic and functional dimensions
Authors
Advisors
Advisor(s):Yiu, SMLam, TW
Issue Date2019
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Yao, H. [姚皓彬]. (2019). Deep computational analysis of metagenomic data in taxonomic and functional dimensions. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractModern high-throughput sequencing technology enables researchers to directly extract DNA data from communities of microbiome as metagenomic data, and thus arouses the needs of computational tools to conduct efficient and accurate analysis for the vast amount of whole-genome sequencing reads generated every day. This thesis presents our contributions in taxonomic and functional analysis of metagenomic data. Taxonomic annotation is often a preliminary and critical task in the pipeline of metagenomic analysis. Although existing tools based on k-mer mapping have displayed huge progress in terms of efficiency, performance degradation in absence of closely-related reference genomes is still a severe problem. Thus, we developed MetaAnnotator and Taxasense, two novel tools that significantly outperform existing tools when no species-level reference is available. As a major breakthrough, the core concepts of MetaAnnotator are: (i) similarity calculation by k-mers in protein-encoding regions along references is more reliable; (ii) to determine the level of nodes for taxonomic annotation, we compute probabilistic models for every pair of genome and taxonomy in the reference database; (iii) we adopt BWT index to accelerate k-mer search queries. Taxasense is an acceleration of MetaAnnotator by the adoption of wavelet-tree-index. The performance advantage of MetaAnnotator is maintained while it can apply on raw reads and short contigs. Another dimension of metagenomics we concern is the discovery of antibiotic resistance genes in metagenomic data. To tackle the problem of inconsistent results between different ARG databases, we conduct a deep review of existing databases and identify reasons for inconsistency. Additionally, we propose methods to reduce the expected error rate for CARD, currently the most widely-used ARG database.
DegreeDoctor of Philosophy
SubjectMetagenomics - Data processing
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/281605

 

DC FieldValueLanguage
dc.contributor.advisorYiu, SM-
dc.contributor.advisorLam, TW-
dc.contributor.authorYao, Haobin-
dc.contributor.author姚皓彬-
dc.date.accessioned2020-03-18T11:33:03Z-
dc.date.available2020-03-18T11:33:03Z-
dc.date.issued2019-
dc.identifier.citationYao, H. [姚皓彬]. (2019). Deep computational analysis of metagenomic data in taxonomic and functional dimensions. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/281605-
dc.description.abstractModern high-throughput sequencing technology enables researchers to directly extract DNA data from communities of microbiome as metagenomic data, and thus arouses the needs of computational tools to conduct efficient and accurate analysis for the vast amount of whole-genome sequencing reads generated every day. This thesis presents our contributions in taxonomic and functional analysis of metagenomic data. Taxonomic annotation is often a preliminary and critical task in the pipeline of metagenomic analysis. Although existing tools based on k-mer mapping have displayed huge progress in terms of efficiency, performance degradation in absence of closely-related reference genomes is still a severe problem. Thus, we developed MetaAnnotator and Taxasense, two novel tools that significantly outperform existing tools when no species-level reference is available. As a major breakthrough, the core concepts of MetaAnnotator are: (i) similarity calculation by k-mers in protein-encoding regions along references is more reliable; (ii) to determine the level of nodes for taxonomic annotation, we compute probabilistic models for every pair of genome and taxonomy in the reference database; (iii) we adopt BWT index to accelerate k-mer search queries. Taxasense is an acceleration of MetaAnnotator by the adoption of wavelet-tree-index. The performance advantage of MetaAnnotator is maintained while it can apply on raw reads and short contigs. Another dimension of metagenomics we concern is the discovery of antibiotic resistance genes in metagenomic data. To tackle the problem of inconsistent results between different ARG databases, we conduct a deep review of existing databases and identify reasons for inconsistency. Additionally, we propose methods to reduce the expected error rate for CARD, currently the most widely-used ARG database. -
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshMetagenomics - Data processing-
dc.titleDeep computational analysis of metagenomic data in taxonomic and functional dimensions-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_991044214993403414-
dc.date.hkucongregation2020-
dc.identifier.mmsid991044214993403414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats