File Download
Supplementary

postgraduate thesis: New methods for studying massive and complex biological data : from ultra-low depth sequence to high resolution structure

TitleNew methods for studying massive and complex biological data : from ultra-low depth sequence to high resolution structure
Authors
Advisors
Issue Date2022
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Li, S. [李舒敏]. (2022). New methods for studying massive and complex biological data : from ultra-low depth sequence to high resolution structure. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractDiscoveries in the field of molecule biology are heavily dependent on the emergence and improvement of modern techniques. Despite the high resolution resulting from these techniques, the large, heterogeneous data presents a significant challenge for computational methods. This thesis introduces new methods or platforms to address three fundamental biology challenges involving genotype-phenotype associations, gene regulatory network inference, and the dynamic organization of chromatins in living cells. Genome-wide association studies (GWAS) have shown success in a variety of diseases and traits owing to their comprehensive assessment of genomic variation across large datasets. To strike a balance between cost and efficiency, an integrative pipeline was built for ultra-low coverage whole genome sequencing (WGS) based GWAS. The genotype imputation was evaluated by a combination of coverage below 0.1x and different sample sizes ranging from 2,000 to 16,000, using 17,844 embryo preimplantation genetic testing samples (0.04x average coverage). Furthermore, 1,744 delivered samples were included in a GWAS to explore the association between fetal genomes and gestational age. Using a comprehensive post-GWAS analysis, 11 genomic risk loci with 166 mapped genes were reported. With published gene expression profiles and a joint analysis, interactions between gestational age-related genes (CRHBP, ICAM1, DKK1, etc.) and preterm birth, infant disease, and breast cancer were established. The second fundamental problem is the reconstruction of gene regulatory networks. Transcriptional factors (TFs), microRNA (miRNA), and long non-coding RNA (lncRNA) are major regulators involved in heart development and functions. Although numerous algorithms have been proposed to infer the co-regulation, these methods do not sufficiently account for highly diverse interacting regulators that control the dynamic processes of gene expression over time. In this study, HeartGRM, an integrative framework for building a gene regulatory network from a time series of gene expression profiles, is proposed. This platform is utilized to study the heart developmental process from both in vitro and in vivo differentiated cardiomyocytes (CMs). Distinct regulatory programs of cardiac TF-miRNA-lncRNA during the early and late stages of CM differentiation are identified. This platform is believed to be applicable to the study of cardiac development and disease, as well as to other biomedical fields. Apart from regulatory factors like TFs or ncRNAs, the compactness of chromatins influences the accessibility of promoter regions and is a major cause of epigenetics regulation. Single-molecule localization microscope (SMLM) technology emerged to generate dynamic super-resolution images of subcellular organization. However, in addition to the lack of suitable self-blinking probes, another issue of adapting SMLM to chromatins is that it is tedious to capture chromatin fibers from tens of millions of raw signals using current tools. To recognize chromatin fibers and determine structural-related characteristics, a novel algorithm and a corresponding chromatin fiber imaging (CFI) platform are proposed. As a result of applying the new methods, the structural plasticity of chromatin fibers and fast chromatin fiber dynamics were captured with less than 2s temporal resolution.
DegreeDoctor of Philosophy
SubjectBioinformatics
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/332080

 

DC FieldValueLanguage
dc.contributor.advisorLuo, R-
dc.contributor.advisorLam, TW-
dc.contributor.advisorYiu, SM-
dc.contributor.authorLi, Shumin-
dc.contributor.author李舒敏-
dc.date.accessioned2023-09-29T04:40:23Z-
dc.date.available2023-09-29T04:40:23Z-
dc.date.issued2022-
dc.identifier.citationLi, S. [李舒敏]. (2022). New methods for studying massive and complex biological data : from ultra-low depth sequence to high resolution structure. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/332080-
dc.description.abstractDiscoveries in the field of molecule biology are heavily dependent on the emergence and improvement of modern techniques. Despite the high resolution resulting from these techniques, the large, heterogeneous data presents a significant challenge for computational methods. This thesis introduces new methods or platforms to address three fundamental biology challenges involving genotype-phenotype associations, gene regulatory network inference, and the dynamic organization of chromatins in living cells. Genome-wide association studies (GWAS) have shown success in a variety of diseases and traits owing to their comprehensive assessment of genomic variation across large datasets. To strike a balance between cost and efficiency, an integrative pipeline was built for ultra-low coverage whole genome sequencing (WGS) based GWAS. The genotype imputation was evaluated by a combination of coverage below 0.1x and different sample sizes ranging from 2,000 to 16,000, using 17,844 embryo preimplantation genetic testing samples (0.04x average coverage). Furthermore, 1,744 delivered samples were included in a GWAS to explore the association between fetal genomes and gestational age. Using a comprehensive post-GWAS analysis, 11 genomic risk loci with 166 mapped genes were reported. With published gene expression profiles and a joint analysis, interactions between gestational age-related genes (CRHBP, ICAM1, DKK1, etc.) and preterm birth, infant disease, and breast cancer were established. The second fundamental problem is the reconstruction of gene regulatory networks. Transcriptional factors (TFs), microRNA (miRNA), and long non-coding RNA (lncRNA) are major regulators involved in heart development and functions. Although numerous algorithms have been proposed to infer the co-regulation, these methods do not sufficiently account for highly diverse interacting regulators that control the dynamic processes of gene expression over time. In this study, HeartGRM, an integrative framework for building a gene regulatory network from a time series of gene expression profiles, is proposed. This platform is utilized to study the heart developmental process from both in vitro and in vivo differentiated cardiomyocytes (CMs). Distinct regulatory programs of cardiac TF-miRNA-lncRNA during the early and late stages of CM differentiation are identified. This platform is believed to be applicable to the study of cardiac development and disease, as well as to other biomedical fields. Apart from regulatory factors like TFs or ncRNAs, the compactness of chromatins influences the accessibility of promoter regions and is a major cause of epigenetics regulation. Single-molecule localization microscope (SMLM) technology emerged to generate dynamic super-resolution images of subcellular organization. However, in addition to the lack of suitable self-blinking probes, another issue of adapting SMLM to chromatins is that it is tedious to capture chromatin fibers from tens of millions of raw signals using current tools. To recognize chromatin fibers and determine structural-related characteristics, a novel algorithm and a corresponding chromatin fiber imaging (CFI) platform are proposed. As a result of applying the new methods, the structural plasticity of chromatin fibers and fast chromatin fiber dynamics were captured with less than 2s temporal resolution. -
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshBioinformatics-
dc.titleNew methods for studying massive and complex biological data : from ultra-low depth sequence to high resolution structure-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2022-
dc.identifier.mmsid991044609109103414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats