File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Deep learning methods for identification and classification of antibiotic resistance genes and mobile genetic elements in bacteria
Title | Deep learning methods for identification and classification of antibiotic resistance genes and mobile genetic elements in bacteria |
---|---|
Authors | |
Advisors | |
Issue Date | 2023 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Pei, Y. [裴瑶]. (2023). Deep learning methods for identification and classification of antibiotic resistance genes and mobile genetic elements in bacteria. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Antibiotic resistance is a substantial and persistent global threat to human health, exacerbated by the emergence of multidrug-resistant bacteria. Antibiotic resistance genes (ARGs) and mobile genetic elements (MGEs) are key components to define bacterial resistance and their spread in different environments. Rapid and robust identification and classification of ARGs and MGEs can contribute to treatment decisions for resistant infections in patients and surveillance of environmental pollutants to assess the outbreak risks. However, the standard laboratory testing methods for antibiotic resistance screening are labour-intensive, biased, slow, and insensitive. Advances in sequencing technologies and deep learning algorithms have provided new opportunities for monitoring ARGs and MGEs more comprehensively and efficiently. In this thesis, I have developed bioinformatics methods for identification and classification of ARGs, integrons and transposons from different types of genetic sequences (i.e., both long and short sequences, amino acid and nucleotide sequences) by utilizing deep neural networks.
A comprehensive ARG sequence database was developed by merging six public ARG databases with extensive curations. A deep neural network structure (named ARGNet), composed of an autoencoder and convolutional neural network, was developed to detect and classify ARGs into 36 antibiotics categories from genetic sequences. ARGNet outperformed other deep learning models in most of the application scenarios with an average precision, recall, F1 score and accuracy at 99.0%, 93.8%, 96.3% and 96.3% respectively, demonstrating its superiority in terms of flexibility, efficiency, and consistency.
Integrons and unit transposons were two representative types of MGEs in bacteria. However, relatively limited software and database resources were available to study their diversity and dynamics. By using bioinformatics and phylogenetics methods, comprehensive sequence databases and systematic classification schemes (which consider the genes’ evolutionary contexts) were constructed and proposed for both types of MGEs. I also extensively studied their global distribution and association with ARGs. Two deep learning models, INTNet and TNet, were developed to identify and classify integrons and unit transposons based on genetic sequences of their key proteins. Multi-task multilabel deep learning frameworks were deployed to enable simultaneous predictions of their attributes, including bacterial hosts, originated environments, and associated ARGs. INTNet and TNet performed consistently well in terms of precision (> 90% and > 99%), recall (> 91% and > 98%) and accuracy (> 94% and > 99 %) across different evaluation tests.
The three deep learning models, ARGNet, INTNet, and TNet, were developed in the same deep learning framework and could be integrated into a multi-functional platform for studying ARGs and MGEs. Their successful application in the real-world metagenomic sequencing data demonstrated their feasibilities. The three deep learning models will be released as standalone programs in open-source community and the use of ARGNet has been provided via the web application (https://argnet.hku.hk/) and will be developed for INTNet and TNet.
ARGNet, INTNet and TNet, and the construction of underpinning sequence databases and phylogenetics-based classification schemes provide insights into the application of deep learning bioinformatics methods in AMR surveillance, largely inspiring future research to combat antimicrobial resistance. |
Degree | Doctor of Philosophy |
Subject | Drug resistance in microorganisms - Genetic aspects Mobile genetic elements Deep learning (Machine learning) |
Dept/Program | Public Health |
Persistent Identifier | http://hdl.handle.net/10722/344416 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Lam, TY | - |
dc.contributor.advisor | Guan, Y | - |
dc.contributor.author | Pei, Yao | - |
dc.contributor.author | 裴瑶 | - |
dc.date.accessioned | 2024-07-30T05:00:45Z | - |
dc.date.available | 2024-07-30T05:00:45Z | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | Pei, Y. [裴瑶]. (2023). Deep learning methods for identification and classification of antibiotic resistance genes and mobile genetic elements in bacteria. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/344416 | - |
dc.description.abstract | Antibiotic resistance is a substantial and persistent global threat to human health, exacerbated by the emergence of multidrug-resistant bacteria. Antibiotic resistance genes (ARGs) and mobile genetic elements (MGEs) are key components to define bacterial resistance and their spread in different environments. Rapid and robust identification and classification of ARGs and MGEs can contribute to treatment decisions for resistant infections in patients and surveillance of environmental pollutants to assess the outbreak risks. However, the standard laboratory testing methods for antibiotic resistance screening are labour-intensive, biased, slow, and insensitive. Advances in sequencing technologies and deep learning algorithms have provided new opportunities for monitoring ARGs and MGEs more comprehensively and efficiently. In this thesis, I have developed bioinformatics methods for identification and classification of ARGs, integrons and transposons from different types of genetic sequences (i.e., both long and short sequences, amino acid and nucleotide sequences) by utilizing deep neural networks. A comprehensive ARG sequence database was developed by merging six public ARG databases with extensive curations. A deep neural network structure (named ARGNet), composed of an autoencoder and convolutional neural network, was developed to detect and classify ARGs into 36 antibiotics categories from genetic sequences. ARGNet outperformed other deep learning models in most of the application scenarios with an average precision, recall, F1 score and accuracy at 99.0%, 93.8%, 96.3% and 96.3% respectively, demonstrating its superiority in terms of flexibility, efficiency, and consistency. Integrons and unit transposons were two representative types of MGEs in bacteria. However, relatively limited software and database resources were available to study their diversity and dynamics. By using bioinformatics and phylogenetics methods, comprehensive sequence databases and systematic classification schemes (which consider the genes’ evolutionary contexts) were constructed and proposed for both types of MGEs. I also extensively studied their global distribution and association with ARGs. Two deep learning models, INTNet and TNet, were developed to identify and classify integrons and unit transposons based on genetic sequences of their key proteins. Multi-task multilabel deep learning frameworks were deployed to enable simultaneous predictions of their attributes, including bacterial hosts, originated environments, and associated ARGs. INTNet and TNet performed consistently well in terms of precision (> 90% and > 99%), recall (> 91% and > 98%) and accuracy (> 94% and > 99 %) across different evaluation tests. The three deep learning models, ARGNet, INTNet, and TNet, were developed in the same deep learning framework and could be integrated into a multi-functional platform for studying ARGs and MGEs. Their successful application in the real-world metagenomic sequencing data demonstrated their feasibilities. The three deep learning models will be released as standalone programs in open-source community and the use of ARGNet has been provided via the web application (https://argnet.hku.hk/) and will be developed for INTNet and TNet. ARGNet, INTNet and TNet, and the construction of underpinning sequence databases and phylogenetics-based classification schemes provide insights into the application of deep learning bioinformatics methods in AMR surveillance, largely inspiring future research to combat antimicrobial resistance. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Drug resistance in microorganisms - Genetic aspects | - |
dc.subject.lcsh | Mobile genetic elements | - |
dc.subject.lcsh | Deep learning (Machine learning) | - |
dc.title | Deep learning methods for identification and classification of antibiotic resistance genes and mobile genetic elements in bacteria | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Public Health | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2023 | - |
dc.identifier.mmsid | 991044836040103414 | - |