File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Development of NGS4THAL, a one-stop molecular diagnosis and carrier screening tool by next-generation sequencing for thalassaemia
Title | Development of NGS4THAL, a one-stop molecular diagnosis and carrier screening tool by next-generation sequencing for thalassaemia |
---|---|
Authors | |
Advisors | |
Issue Date | 2021 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Cao Yujie, [曹玉杰]. (2021). Development of NGS4THAL, a one-stop molecular diagnosis and carrier screening tool by next-generation sequencing for thalassaemia. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Thalassaemia is caused by deficient production of haemoglobin molecules and is one of the major health threats worldwide, especially in many tropical and subtropical regions. Traditionally, molecular diagnosis of thalassaemia starts with haematological tests and haematological electrophoresis, followed by genetic tests to identify the exact pathogenic variant(s). This requires cumbersome laboratory work and involves step-by-step decision making upon different circumstances, making it difficult to scale up, particularly for expanded screening in a population. Next generation sequencing (NGS) method is rapidly becoming a powerful alternative, but sequence homology in haemoglobin regions causes problems for the variants detection by NGS. Thus, an accurate, efficient and scalable NGS-based tool is urgently needed for molecular diagnosis and carrier screening for thalassaemia.
To achieve this goal, in this study, a one-stop bioinformatics analysis pipeline, named NGS4THAL, was developed for detecting thalassaemia pathogenic variants by NGS. NGS4THAL is designed to deal with issues caused by haemoglobin sequence homology. Key features of NGS4THAL include: firstly, informative NGS reads with multiple alignments located within HBA2-HBA1-HBAP1, HBZ-HBZP1, and HBG1-HBG2 are recovered and realigned for better detection of point mutations and small InDels; secondly, a combination of complementary structural variants (SVs) callers based on read-pair, split-read and read-depth methods are tailored for better detection of various types of haemoglobin SVs. Thirdly, a two-stage process was developed for the detection of compound haemoglobin SVs (such as --SEA/-α3.7). NGS4THAL was systematically validated using various types of NGS data, including simulation data, real-world targeted sequencing data, whole genome sequencing (WGS) data, and whole exome sequencing (WES) data.
When tested with simulation data, NGS4THAL achieved a 98.48% sensitivity for pathogenic point mutations and small InDels, a 98.40% sensitivity for haemoglobin SVs, and a 100% specificity for all these mutation types. Testing of NGS4THAL on samples sequenced by the targeted sequencing platform with confirmed thalassaemia pathogenic variants by laboratory methods, 100% detection rate was achieved, including detection of Hb CS, Hb QS, -28 (A>G), CD 41/42 (-CTTT), IVS II-654 C>T, CD71/72 (+A), --SEA, -3.7, -4.2, and compound rearrangements of --SEA/-4.2, --SEA/-3.7, and --SEA/anti4.2, with no false positive detection in terms of the pathogenic variants. Applying NGS4THAL on WGS data from a Hong Kong Chinese cohort (N=375) collected from a Hirschsprung’s disease, a carrier rate of 11.7% was detected with the detected mutational spectrum consistent with epidemiology reports. Similarly, a carrier rate of 14.9% was detected for a Northern Vietnamese cohort (N=161) from the same study. In addition, a risk haplotype, G(rs2685118) - G(rs1203957) - A(rs3918352) - T(rs11642609) - G (rs1203974) - A(rs8045291), was defined to be in association with --SEA deletion (P-value < 2.2e-16, OR = 114.61), with a sensitivity of 95.45% and specificity of 84.51% as a surrogate for --SEA.
In summary, NGS4THAL is a highly accurate and efficient tool on analyzing NGS data for molecular diagnosis and carrier screening for thalassaemia, highlighting the necessity of sequence-specific data analysis strategy in mutation detection. It could play a role in moving population screening of thalassaemia forward in Hong Kong P.R.C and other regions severely affected by the disease. |
Degree | Doctor of Philosophy |
Subject | Thalassemia - Diagnosis Nucleotide sequence |
Dept/Program | Paediatrics and Adolescent Medicine |
Persistent Identifier | http://hdl.handle.net/10722/310685 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Yang, W | - |
dc.contributor.advisor | Lau, YL | - |
dc.contributor.author | Cao Yujie | - |
dc.contributor.author | 曹玉杰 | - |
dc.date.accessioned | 2022-02-08T11:54:10Z | - |
dc.date.available | 2022-02-08T11:54:10Z | - |
dc.date.issued | 2021 | - |
dc.identifier.citation | Cao Yujie, [曹玉杰]. (2021). Development of NGS4THAL, a one-stop molecular diagnosis and carrier screening tool by next-generation sequencing for thalassaemia. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/310685 | - |
dc.description.abstract | Thalassaemia is caused by deficient production of haemoglobin molecules and is one of the major health threats worldwide, especially in many tropical and subtropical regions. Traditionally, molecular diagnosis of thalassaemia starts with haematological tests and haematological electrophoresis, followed by genetic tests to identify the exact pathogenic variant(s). This requires cumbersome laboratory work and involves step-by-step decision making upon different circumstances, making it difficult to scale up, particularly for expanded screening in a population. Next generation sequencing (NGS) method is rapidly becoming a powerful alternative, but sequence homology in haemoglobin regions causes problems for the variants detection by NGS. Thus, an accurate, efficient and scalable NGS-based tool is urgently needed for molecular diagnosis and carrier screening for thalassaemia. To achieve this goal, in this study, a one-stop bioinformatics analysis pipeline, named NGS4THAL, was developed for detecting thalassaemia pathogenic variants by NGS. NGS4THAL is designed to deal with issues caused by haemoglobin sequence homology. Key features of NGS4THAL include: firstly, informative NGS reads with multiple alignments located within HBA2-HBA1-HBAP1, HBZ-HBZP1, and HBG1-HBG2 are recovered and realigned for better detection of point mutations and small InDels; secondly, a combination of complementary structural variants (SVs) callers based on read-pair, split-read and read-depth methods are tailored for better detection of various types of haemoglobin SVs. Thirdly, a two-stage process was developed for the detection of compound haemoglobin SVs (such as --SEA/-α3.7). NGS4THAL was systematically validated using various types of NGS data, including simulation data, real-world targeted sequencing data, whole genome sequencing (WGS) data, and whole exome sequencing (WES) data. When tested with simulation data, NGS4THAL achieved a 98.48% sensitivity for pathogenic point mutations and small InDels, a 98.40% sensitivity for haemoglobin SVs, and a 100% specificity for all these mutation types. Testing of NGS4THAL on samples sequenced by the targeted sequencing platform with confirmed thalassaemia pathogenic variants by laboratory methods, 100% detection rate was achieved, including detection of Hb CS, Hb QS, -28 (A>G), CD 41/42 (-CTTT), IVS II-654 C>T, CD71/72 (+A), --SEA, -3.7, -4.2, and compound rearrangements of --SEA/-4.2, --SEA/-3.7, and --SEA/anti4.2, with no false positive detection in terms of the pathogenic variants. Applying NGS4THAL on WGS data from a Hong Kong Chinese cohort (N=375) collected from a Hirschsprung’s disease, a carrier rate of 11.7% was detected with the detected mutational spectrum consistent with epidemiology reports. Similarly, a carrier rate of 14.9% was detected for a Northern Vietnamese cohort (N=161) from the same study. In addition, a risk haplotype, G(rs2685118) - G(rs1203957) - A(rs3918352) - T(rs11642609) - G (rs1203974) - A(rs8045291), was defined to be in association with --SEA deletion (P-value < 2.2e-16, OR = 114.61), with a sensitivity of 95.45% and specificity of 84.51% as a surrogate for --SEA. In summary, NGS4THAL is a highly accurate and efficient tool on analyzing NGS data for molecular diagnosis and carrier screening for thalassaemia, highlighting the necessity of sequence-specific data analysis strategy in mutation detection. It could play a role in moving population screening of thalassaemia forward in Hong Kong P.R.C and other regions severely affected by the disease. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Thalassemia - Diagnosis | - |
dc.subject.lcsh | Nucleotide sequence | - |
dc.title | Development of NGS4THAL, a one-stop molecular diagnosis and carrier screening tool by next-generation sequencing for thalassaemia | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Paediatrics and Adolescent Medicine | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2021 | - |
dc.identifier.mmsid | 991044360599103414 | - |