File Download
Supplementary

postgraduate thesis: Development of tailored NGS data analysis pipeline for the diagnosis of neuromuscular disorders

TitleDevelopment of tailored NGS data analysis pipeline for the diagnosis of neuromuscular disorders
Authors
Advisors
Advisor(s):Chan, HSSYang, W
Issue Date2023
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Lei, Y. [雷尧]. (2023). Development of tailored NGS data analysis pipeline for the diagnosis of neuromuscular disorders. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractNeuromuscular disease (NMD) is a group of rare diseases affecting the normal function of neurons, muscles, and neuromuscular junctions. NMD is genetically and clinically heterozygous. Conventional technologies for genetic diagnosis of NMD include linkage analysis, Sanger sequencing, multiple ligation probe analysis, and cytogenetic genomic hybridization arrays. However, the diagnosis rate is only about 31-47% as these technologies focus on a small number of targeted genes or regions. NGS is a high-throughput sequencing technology that covers more or all regions of the human genome. It accelerates the diagnosis of NMD patients. This study analyzed 30 patients in 25 families that had undergone genetic analysis without positive findings. In most studies using NGS technology, the diagnosis rate is about 40-60%. There are still many patients without a genetic diagnosis. This study is trying to develop a tailored NGS data analysis pipeline to increase the diagnostic rate of this hard-to-diagnose batch. We also want to investigate the factors that affect the diagnosis of this hard-to-diagnose batch, such as sequencing strategy, analysis strategy, updated public databases, or variant type. This study did WES and/or WGS sequencing to 30 hard-to-diagnose NMD patients and developed a new NMD NGS data analysis pipeline using the latest analysis algorithms and updated public databases and clinical information. In the data analysis pipeline, GATK best practice (v4.1.8.1) was used to call the SNVs/Indels, GATK gCNV and CNVnator were used to call the CNVs, MToolBox, mtDNA-server, and Mutect2 were used to call the mDNA variants. Traditional annotation-filtration strategy (ANNOVAR) and new prioritization strategy (HPO and Exomiser) were used in the SNVs/indels analysis. Exomiser could combines both phenotype and genotype of patients to prioritize the variants based on the exomiser score. Thirteen patients from 11 families had their potential causal variant(s) identified, which account for 43% of all patients. TTN gene is the most common gene that contains potential causal variants in this study (8 patients from 6 families). 4 meta-transcript-only variants and 5 C terminal variants were identified in the TTN gene. Two siblings presented with arthrogyposis multiplex congenita. Another two siblings have calpainopathy-like presentation, and two unrelated patients have presentation of limb girdle muscular dystrophy. The correlation of the different clinical presentations and the type of mutations are worth exploring. Also, a copy number variant and a frameshift variant were identified as the potential causal variants in the CACNA1S gene in one patient causing severe congenital myopathy. Besides, variants in the TBCK gene were identified in a patient with significant intellectual disability, epilepsy and myopathy causing a severe autosomal recessive congenital developmental disorder with profound hypotonia, developmental delay, slow motor progression and epilepsy. Heterozygous deep intronic variant was also found in the DMD gene in a patient with compatible clinical presentation, diagnostic muscle biopsy but no exon deletion or point mutation have been identified in prior genetic testing. Heterozygous pathogenic variant in MFN2 were also identified in a patient with motor and sensory axonal polyneuropathy. Another patient has compound heterozygous POMT1 variants presented with mild intellectual disabilities and hyperCKaemia but no major muscle weakness. Seven missense variants were identified from 6 patients. In conclusion, using this new analysis pipeline, 43% of this hard-to-diagnose patients had their potential causal variants being identified. The current pipeline significantly increased the diagnostic rate and accelerating the diagnostic process for neuromuscular disease. Missense variants are common in the human genome. The differentiation of benign variant from pathogenic one is complex and can be sophisticated. To confirm the pathogenicity of the identified variant(s), further analysis of the transcriptomic and proteomic data is often needed.
DegreeMaster of Philosophy
SubjectNeuromuscular diseases - Diagnosis
High-throughput nucleotide sequencing
Dept/ProgramPaediatrics and Adolescent Medicine
Persistent Identifierhttp://hdl.handle.net/10722/336468

 

DC FieldValueLanguage
dc.contributor.advisorChan, HSS-
dc.contributor.advisorYang, W-
dc.contributor.authorLei, Yao-
dc.contributor.author雷尧-
dc.date.accessioned2024-01-31T10:55:01Z-
dc.date.available2024-01-31T10:55:01Z-
dc.date.issued2023-
dc.identifier.citationLei, Y. [雷尧]. (2023). Development of tailored NGS data analysis pipeline for the diagnosis of neuromuscular disorders. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/336468-
dc.description.abstractNeuromuscular disease (NMD) is a group of rare diseases affecting the normal function of neurons, muscles, and neuromuscular junctions. NMD is genetically and clinically heterozygous. Conventional technologies for genetic diagnosis of NMD include linkage analysis, Sanger sequencing, multiple ligation probe analysis, and cytogenetic genomic hybridization arrays. However, the diagnosis rate is only about 31-47% as these technologies focus on a small number of targeted genes or regions. NGS is a high-throughput sequencing technology that covers more or all regions of the human genome. It accelerates the diagnosis of NMD patients. This study analyzed 30 patients in 25 families that had undergone genetic analysis without positive findings. In most studies using NGS technology, the diagnosis rate is about 40-60%. There are still many patients without a genetic diagnosis. This study is trying to develop a tailored NGS data analysis pipeline to increase the diagnostic rate of this hard-to-diagnose batch. We also want to investigate the factors that affect the diagnosis of this hard-to-diagnose batch, such as sequencing strategy, analysis strategy, updated public databases, or variant type. This study did WES and/or WGS sequencing to 30 hard-to-diagnose NMD patients and developed a new NMD NGS data analysis pipeline using the latest analysis algorithms and updated public databases and clinical information. In the data analysis pipeline, GATK best practice (v4.1.8.1) was used to call the SNVs/Indels, GATK gCNV and CNVnator were used to call the CNVs, MToolBox, mtDNA-server, and Mutect2 were used to call the mDNA variants. Traditional annotation-filtration strategy (ANNOVAR) and new prioritization strategy (HPO and Exomiser) were used in the SNVs/indels analysis. Exomiser could combines both phenotype and genotype of patients to prioritize the variants based on the exomiser score. Thirteen patients from 11 families had their potential causal variant(s) identified, which account for 43% of all patients. TTN gene is the most common gene that contains potential causal variants in this study (8 patients from 6 families). 4 meta-transcript-only variants and 5 C terminal variants were identified in the TTN gene. Two siblings presented with arthrogyposis multiplex congenita. Another two siblings have calpainopathy-like presentation, and two unrelated patients have presentation of limb girdle muscular dystrophy. The correlation of the different clinical presentations and the type of mutations are worth exploring. Also, a copy number variant and a frameshift variant were identified as the potential causal variants in the CACNA1S gene in one patient causing severe congenital myopathy. Besides, variants in the TBCK gene were identified in a patient with significant intellectual disability, epilepsy and myopathy causing a severe autosomal recessive congenital developmental disorder with profound hypotonia, developmental delay, slow motor progression and epilepsy. Heterozygous deep intronic variant was also found in the DMD gene in a patient with compatible clinical presentation, diagnostic muscle biopsy but no exon deletion or point mutation have been identified in prior genetic testing. Heterozygous pathogenic variant in MFN2 were also identified in a patient with motor and sensory axonal polyneuropathy. Another patient has compound heterozygous POMT1 variants presented with mild intellectual disabilities and hyperCKaemia but no major muscle weakness. Seven missense variants were identified from 6 patients. In conclusion, using this new analysis pipeline, 43% of this hard-to-diagnose patients had their potential causal variants being identified. The current pipeline significantly increased the diagnostic rate and accelerating the diagnostic process for neuromuscular disease. Missense variants are common in the human genome. The differentiation of benign variant from pathogenic one is complex and can be sophisticated. To confirm the pathogenicity of the identified variant(s), further analysis of the transcriptomic and proteomic data is often needed.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshNeuromuscular diseases - Diagnosis-
dc.subject.lcshHigh-throughput nucleotide sequencing-
dc.titleDevelopment of tailored NGS data analysis pipeline for the diagnosis of neuromuscular disorders-
dc.typePG_Thesis-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplinePaediatrics and Adolescent Medicine-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2023-
dc.identifier.mmsid991044657074703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats