File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Appears in Collections:
Conference Paper: IDDF2023-ABS-0267 Strain-resolved taxonomic profiling and functional prediction of human microbiota using Strain2bFunc
Title | IDDF2023-ABS-0267 Strain-resolved taxonomic profiling and functional prediction of human microbiota using Strain2bFunc |
---|---|
Authors | |
Issue Date | 6-Jul-2023 |
Abstract | Background Strain-level profiling for human microbiota remains challenging for conventional metagenomic methods (IDDF2023-ABS-0267 Figure 1. Research context: low-biomass microbiome analysis broadly links to clinical challenges/opportunities, IDDF2023-ABS-0267 Table 1), such as marker-gene-based analysis and whole metagenomic sequencing (WMS). Recently, microbiotas have been detected in various internal tissues, highlighting their crucial involvement in the pathogenesis of chronic GI diseases. However, due to their low microbial-biomass and high host-contamination nature, accurate microbial identification and functional analysis become far more difficult or even infeasible using the currently available methods. Methods To address these notable challenges, we developed a reduced metagenome sequencing strategy (2bRAD) that only captures ~ 1% of the target metagenome assisted by type IIB restriction digestion for microbiome analysis. It creates a new sequence dataset and accordingly a new bioinformatic pipeline (Strain2bFunc) equipped with the machine-learning method for strain-level profiling (IDDF2023-ABS-0267 Figure 2 (A) Research questions. Strain2bFunc aims to resolve strain compositions in various microbiotas at a low cost. (B) Research objectives: its performance will be evaluated with in silico simulation metagenomes, mock-community DNA samples, and real clinical samples, IDDF2023-ABS-0267 Figure 3. Illustration of the Strain2bFunc workflow. It contains three major modules: 2bRAD sequencing, strain identification, and functional prediction. 1. 2bRAD sequencing protocol will generate millions of short and iso-length 2b tags for each bio-sample. 2. We first create a reduced version of full reference genome database (GTDB + FungiDB), namely 2bGDB. Strain identification will be performed based on a deconvolution method using a 2bRAD copy number matrix and read-count matrix obtained from 2bRAD sequencing. 3. Functional profile for each sample will be computed based on the strain-level abundance matrix and prebuilt KO copy-number matrix for each strain/genome). Results We tested its profiling performance based on in silico simulation metagenomes, DNA mock communities simulating both normal (e.g., gut microbiome) and high-host scenarios (e.g., tissue microbiome, etc.), and real clinical samples with reference to marker-gene-based or WMS methods (IDDF2023-ABS-0267 Figure 2 (A) Research questions. Strain2bFunc aims to resolve strain compositions in various microbiotas at a low cost. (B) Research objectives: its performance will be evaluated with in silico simulation metagenomes, mock-community DNA samples, and real clinical samples). Using the in-silicon simulation, we demonstrated that Strain2bFunc can reach a far higher resolution (i.e., genome-wide difference <0.001) in strain-level identification than 16S, which is similar to WMS (IDDF2023-ABS-0267 Figure 4 (A) Dendrograms display hierarchical clustering of 27 Cutibacterium acne (CA) genomes using the whole genome, reduced metagenome (2bRAD) and 16S rRNA gene. (B) The correlation of genetic distance between each pair of CA genomes measured by the whole genome, reduced metagenome (2bRAD) and 16S rRNA gene. (C) Benchmark ‘Strain2bFunc’ (reduced-metagenome data) against QIIME2 (16S rRNA data) using a simulation metagenome mixed with 27 CA genomes in equal abundance). With DNA mock communities, we found that substantial false positives and low technical reproducibility issues can be found in the 16S-based microbiome study of high-host samples, whereas Strain2bFunc produced highly accurate profiling results with high technical stability (IDDF2023-ABS-0267 Figure 4 (A) Dendrograms display hierarchical clustering of 27 Cutibacterium acne (CA) genomes using the whole genome, reduced metagenome (2bRAD) and 16S rRNA gene. (B) The correlation of genetic distance between each pair of CA genomes measured by the whole genome, reduced metagenome (2bRAD) and 16S rRNA gene. (C) Benchmark ‘Strain2bFunc’ (reduced-metagenome data) against QIIME2 (16S rRNA data) using a simulation metagenome mixed with 27 CA genomes in equal abundance, IDDF2023-ABS-0267 Figure 6. The 2bRAD sequencing method excelled in taxonomy profiling of mock communities at the (A) genus and (B) species level, while 16S method introduces false positives in microbial identification and huge PCR bias in abundance estimation. A mock sample is a mixture of human DNA (90% or 99%) and bacterial DNA of MSA1002 (10% or 1%). AUPR measures the microbial identification accuracy while L2 similarity measures the similarity between the true and predicted taxonomic abundance profiles by 16S rRNA or 2bRAD sequencing method. Two technical replicates for either 16S rRNA or 2bRAD sequencing methods were included for each mock sample). Strain2bFunc enables us to predict the functional potentials of a biome using whole genome-based KEGG database annotations (IDDF2023-ABS-0267 Figure 5. The strain-level composition and functional profiles of in silico simulation metagenomes of human oral, gut, vaginal, skin and built environment. (A) The strain-level composition profiles and identification accuracy measured by AUPR based on the ground-truth profiles. (B) The second level KEGG BRITE classification of functions for each microbiota). For real samples, we derived the fungi-to-bacteria, and human-to-microbe ratios from 2bRAD sequence data, exhibiting great biological significance in the clinical studies. Furthermore, the high-resolution strain-level profiling with Strain2bFunc provided great opportunities to analyze microbiome translocation and better explain the biological variation in hosts. Conclusions We established the Strain2bFunc for efficient strain-resolution and functional analysis of microbiome, particularly for low-biomass clinical samples of broad interest in Gastroenterology or Hepatology. |
Persistent Identifier | http://hdl.handle.net/10722/333917 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Huang, Shi | - |
dc.contributor.author | Zhang, Yufeng | - |
dc.contributor.author | Liu, Jiang | - |
dc.contributor.author | Wang, Xiuping | - |
dc.contributor.author | Zhou, Lisa | - |
dc.contributor.author | Wang, Shi | - |
dc.date.accessioned | 2023-10-06T08:40:13Z | - |
dc.date.available | 2023-10-06T08:40:13Z | - |
dc.date.issued | 2023-07-06 | - |
dc.identifier.uri | http://hdl.handle.net/10722/333917 | - |
dc.description.abstract | <p><strong>Background</strong> Strain-level profiling for human microbiota remains challenging for conventional metagenomic methods (IDDF2023-ABS-0267 Figure 1. Research context: low-biomass microbiome analysis broadly links to clinical challenges/opportunities, IDDF2023-ABS-0267 Table 1), such as marker-gene-based analysis and whole metagenomic sequencing (WMS). Recently, microbiotas have been detected in various internal tissues, highlighting their crucial involvement in the pathogenesis of chronic GI diseases. However, due to their low microbial-biomass and high host-contamination nature, accurate microbial identification and functional analysis become far more difficult or even infeasible using the currently available methods.<br></p><p><strong>Methods</strong> To address these notable challenges, we developed a reduced metagenome sequencing strategy (2bRAD) that only captures ~ 1% of the target metagenome assisted by type IIB restriction digestion for microbiome analysis. It creates a new sequence dataset and accordingly a new bioinformatic pipeline (Strain2bFunc) equipped with the machine-learning method for strain-level profiling (IDDF2023-ABS-0267 Figure 2 (A) Research questions. Strain2bFunc aims to resolve strain compositions in various microbiotas at a low cost. (B) Research objectives: its performance will be evaluated with <em>in silico</em> simulation metagenomes, mock-community DNA samples, and real clinical samples, IDDF2023-ABS-0267 Figure 3. <strong>Illustration of the Strain2bFunc workflow</strong>. It contains three major modules: 2bRAD sequencing, strain identification, and functional prediction. 1. 2bRAD sequencing protocol will generate millions of short and iso-length 2b tags for each bio-sample. 2. We first create a reduced version of full reference genome database (GTDB + FungiDB), namely <strong>2bGDB. Strain identification</strong> will be performed based on a deconvolution method using a 2bRAD copy number matrix and read-count matrix obtained from 2bRAD sequencing. 3. <strong>Functional profile</strong> for each sample will be computed based on the strain-level abundance matrix and prebuilt KO copy-number matrix for each strain/genome).</p><p><strong>Results</strong> We tested its profiling performance based on in silico simulation metagenomes, DNA mock communities simulating both normal (e.g., gut microbiome) and high-host scenarios (e.g., tissue microbiome, etc.), and real clinical samples with reference to marker-gene-based or WMS methods (IDDF2023-ABS-0267 Figure 2 (A) Research questions. Strain2bFunc aims to resolve strain compositions in various microbiotas at a low cost. (B) Research objectives: its performance will be evaluated with <em>in silico</em> simulation metagenomes, mock-community DNA samples, and real clinical samples). Using the in-silicon simulation, we demonstrated that Strain2bFunc can reach a far higher resolution (i.e., genome-wide difference <0.001) in strain-level identification than 16S, which is similar to WMS (IDDF2023-ABS-0267 Figure 4 (A) Dendrograms display hierarchical clustering of 27 Cutibacterium acne (CA) genomes using the whole genome, reduced metagenome (2bRAD) and 16S rRNA gene. (B) The correlation of genetic distance between each pair of CA genomes measured by the whole genome, reduced metagenome (2bRAD) and 16S rRNA gene. (C) Benchmark ‘Strain2bFunc’ (reduced-metagenome data) against QIIME2 (16S rRNA data) using a simulation metagenome mixed with 27 CA genomes in equal abundance). With DNA mock communities, we found that substantial false positives and low technical reproducibility issues can be found in the 16S-based microbiome study of high-host samples, whereas Strain2bFunc produced highly accurate profiling results with high technical stability (IDDF2023-ABS-0267 Figure 4 (A) Dendrograms display hierarchical clustering of 27 Cutibacterium acne (CA) genomes using the whole genome, reduced metagenome (2bRAD) and 16S rRNA gene. (B) The correlation of genetic distance between each pair of CA genomes measured by the whole genome, reduced metagenome (2bRAD) and 16S rRNA gene. (C) Benchmark ‘Strain2bFunc’ (reduced-metagenome data) against QIIME2 (16S rRNA data) using a simulation metagenome mixed with 27 CA genomes in equal abundance, IDDF2023-ABS-0267 Figure 6. The 2bRAD sequencing method <strong>excelled</strong> in taxonomy profiling of mock communities at the (A) genus and (B) species level, while 16S method introduces <strong>false positives</strong> in microbial identification and <strong>huge PCR bias</strong> in abundance estimation. A mock sample is a mixture of human DNA (90% or 99%) and bacterial DNA of MSA1002 (10% or 1%). AUPR measures the microbial identification accuracy while L2 similarity measures the similarity between the true and predicted taxonomic abundance profiles by 16S rRNA or 2bRAD sequencing method. Two technical replicates for either 16S rRNA or 2bRAD sequencing methods were included for each mock sample). Strain2bFunc enables us to predict the functional potentials of a biome using whole genome-based KEGG database annotations (IDDF2023-ABS-0267 Figure 5. The strain-level composition and functional profiles of <em>in silico</em> simulation metagenomes of human oral, gut, vaginal, skin and built environment. (A) The strain-level composition profiles and identification accuracy measured by AUPR based on the ground-truth profiles. (B) The second level KEGG BRITE classification of functions for each microbiota). For real samples, we derived the fungi-to-bacteria, and human-to-microbe ratios from 2bRAD sequence data, exhibiting great biological significance in the clinical studies. Furthermore, the high-resolution strain-level profiling with Strain2bFunc provided great opportunities to analyze microbiome translocation and better explain the biological variation in hosts.</p><p><strong>Conclusions</strong> We established the Strain2bFunc for efficient strain-resolution and functional analysis of microbiome, particularly for low-biomass clinical samples of broad interest in Gastroenterology or Hepatology.</p> | - |
dc.language | eng | - |
dc.relation.ispartof | International Digestive Disease Forum (10/06/2023-11/06/2023, Hong Kong) | - |
dc.title | IDDF2023-ABS-0267 Strain-resolved taxonomic profiling and functional prediction of human microbiota using Strain2bFunc | - |
dc.type | Conference_Paper | - |
dc.identifier.doi | 10.1136/gutjnl-2023-IDDF.106 | - |
dc.identifier.issue | 72 | - |
dc.identifier.spage | A120 | - |
dc.identifier.epage | A123 | - |