File Download
Supplementary

postgraduate thesis: Deep learning methods for identification and classification of antibiotic resistance genes and mobile genetic elements in bacteria

TitleDeep learning methods for identification and classification of antibiotic resistance genes and mobile genetic elements in bacteria
Authors
Advisors
Advisor(s):Lam, TYGuan, Y
Issue Date2023
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Pei, Y. [裴瑶]. (2023). Deep learning methods for identification and classification of antibiotic resistance genes and mobile genetic elements in bacteria. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractAntibiotic resistance is a substantial and persistent global threat to human health, exacerbated by the emergence of multidrug-resistant bacteria. Antibiotic resistance genes (ARGs) and mobile genetic elements (MGEs) are key components to define bacterial resistance and their spread in different environments. Rapid and robust identification and classification of ARGs and MGEs can contribute to treatment decisions for resistant infections in patients and surveillance of environmental pollutants to assess the outbreak risks. However, the standard laboratory testing methods for antibiotic resistance screening are labour-intensive, biased, slow, and insensitive. Advances in sequencing technologies and deep learning algorithms have provided new opportunities for monitoring ARGs and MGEs more comprehensively and efficiently. In this thesis, I have developed bioinformatics methods for identification and classification of ARGs, integrons and transposons from different types of genetic sequences (i.e., both long and short sequences, amino acid and nucleotide sequences) by utilizing deep neural networks. A comprehensive ARG sequence database was developed by merging six public ARG databases with extensive curations. A deep neural network structure (named ARGNet), composed of an autoencoder and convolutional neural network, was developed to detect and classify ARGs into 36 antibiotics categories from genetic sequences. ARGNet outperformed other deep learning models in most of the application scenarios with an average precision, recall, F1 score and accuracy at 99.0%, 93.8%, 96.3% and 96.3% respectively, demonstrating its superiority in terms of flexibility, efficiency, and consistency. Integrons and unit transposons were two representative types of MGEs in bacteria. However, relatively limited software and database resources were available to study their diversity and dynamics. By using bioinformatics and phylogenetics methods, comprehensive sequence databases and systematic classification schemes (which consider the genes’ evolutionary contexts) were constructed and proposed for both types of MGEs. I also extensively studied their global distribution and association with ARGs. Two deep learning models, INTNet and TNet, were developed to identify and classify integrons and unit transposons based on genetic sequences of their key proteins. Multi-task multilabel deep learning frameworks were deployed to enable simultaneous predictions of their attributes, including bacterial hosts, originated environments, and associated ARGs. INTNet and TNet performed consistently well in terms of precision (> 90% and > 99%), recall (> 91% and > 98%) and accuracy (> 94% and > 99 %) across different evaluation tests. The three deep learning models, ARGNet, INTNet, and TNet, were developed in the same deep learning framework and could be integrated into a multi-functional platform for studying ARGs and MGEs. Their successful application in the real-world metagenomic sequencing data demonstrated their feasibilities. The three deep learning models will be released as standalone programs in open-source community and the use of ARGNet has been provided via the web application (https://argnet.hku.hk/) and will be developed for INTNet and TNet. ARGNet, INTNet and TNet, and the construction of underpinning sequence databases and phylogenetics-based classification schemes provide insights into the application of deep learning bioinformatics methods in AMR surveillance, largely inspiring future research to combat antimicrobial resistance.
DegreeDoctor of Philosophy
SubjectDrug resistance in microorganisms - Genetic aspects
Mobile genetic elements
Deep learning (Machine learning)
Dept/ProgramPublic Health
Persistent Identifierhttp://hdl.handle.net/10722/344416

 

DC FieldValueLanguage
dc.contributor.advisorLam, TY-
dc.contributor.advisorGuan, Y-
dc.contributor.authorPei, Yao-
dc.contributor.author裴瑶-
dc.date.accessioned2024-07-30T05:00:45Z-
dc.date.available2024-07-30T05:00:45Z-
dc.date.issued2023-
dc.identifier.citationPei, Y. [裴瑶]. (2023). Deep learning methods for identification and classification of antibiotic resistance genes and mobile genetic elements in bacteria. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/344416-
dc.description.abstractAntibiotic resistance is a substantial and persistent global threat to human health, exacerbated by the emergence of multidrug-resistant bacteria. Antibiotic resistance genes (ARGs) and mobile genetic elements (MGEs) are key components to define bacterial resistance and their spread in different environments. Rapid and robust identification and classification of ARGs and MGEs can contribute to treatment decisions for resistant infections in patients and surveillance of environmental pollutants to assess the outbreak risks. However, the standard laboratory testing methods for antibiotic resistance screening are labour-intensive, biased, slow, and insensitive. Advances in sequencing technologies and deep learning algorithms have provided new opportunities for monitoring ARGs and MGEs more comprehensively and efficiently. In this thesis, I have developed bioinformatics methods for identification and classification of ARGs, integrons and transposons from different types of genetic sequences (i.e., both long and short sequences, amino acid and nucleotide sequences) by utilizing deep neural networks. A comprehensive ARG sequence database was developed by merging six public ARG databases with extensive curations. A deep neural network structure (named ARGNet), composed of an autoencoder and convolutional neural network, was developed to detect and classify ARGs into 36 antibiotics categories from genetic sequences. ARGNet outperformed other deep learning models in most of the application scenarios with an average precision, recall, F1 score and accuracy at 99.0%, 93.8%, 96.3% and 96.3% respectively, demonstrating its superiority in terms of flexibility, efficiency, and consistency. Integrons and unit transposons were two representative types of MGEs in bacteria. However, relatively limited software and database resources were available to study their diversity and dynamics. By using bioinformatics and phylogenetics methods, comprehensive sequence databases and systematic classification schemes (which consider the genes’ evolutionary contexts) were constructed and proposed for both types of MGEs. I also extensively studied their global distribution and association with ARGs. Two deep learning models, INTNet and TNet, were developed to identify and classify integrons and unit transposons based on genetic sequences of their key proteins. Multi-task multilabel deep learning frameworks were deployed to enable simultaneous predictions of their attributes, including bacterial hosts, originated environments, and associated ARGs. INTNet and TNet performed consistently well in terms of precision (> 90% and > 99%), recall (> 91% and > 98%) and accuracy (> 94% and > 99 %) across different evaluation tests. The three deep learning models, ARGNet, INTNet, and TNet, were developed in the same deep learning framework and could be integrated into a multi-functional platform for studying ARGs and MGEs. Their successful application in the real-world metagenomic sequencing data demonstrated their feasibilities. The three deep learning models will be released as standalone programs in open-source community and the use of ARGNet has been provided via the web application (https://argnet.hku.hk/) and will be developed for INTNet and TNet. ARGNet, INTNet and TNet, and the construction of underpinning sequence databases and phylogenetics-based classification schemes provide insights into the application of deep learning bioinformatics methods in AMR surveillance, largely inspiring future research to combat antimicrobial resistance.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshDrug resistance in microorganisms - Genetic aspects-
dc.subject.lcshMobile genetic elements-
dc.subject.lcshDeep learning (Machine learning)-
dc.titleDeep learning methods for identification and classification of antibiotic resistance genes and mobile genetic elements in bacteria-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplinePublic Health-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2023-
dc.identifier.mmsid991044836040103414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats