File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Computational methods for regulatory analysis of transcription process in single cells
| Title | Computational methods for regulatory analysis of transcription process in single cells |
|---|---|
| Authors | |
| Issue Date | 2025 |
| Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
| Citation | Hou, R. [厚蕊燕]. (2025). Computational methods for regulatory analysis of transcription process in single cells. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
| Abstract | The central dogma of molecular biology explains the flow of genetic information within
a biological system. The process by which "DNA produces RNA" is complex. DNA is
transcribed into precursor messenger RNA (pre-mRNA), which undergoes processing
to form mature RNA. The processing includes capping, splicing, and 3’-end cleavage
followed by polyadenylation. After the RNAs finished their function, they would degraded.
I did three projects relevant with the whole process and developed computational tools to
help people in wet lab to explore the biology mechanism.
The RNA splicing efficiency is of high interest for both understanding the regulatory
machinery of gene expression and estimating the RNA velocity in single cells. However,
its genomic regulation and stochasticity across contexts remain poorly understood.
In the first project, by leveraging the recent RNA velocity tool, we estimated
the relative splicing efficiency across a variety of single-cell RNA-Seq data sets. We
further extracted large sets of genomic features and 120 RNA-binding protein features
and found they are highly predictive to relative RNA splicing efficiency across multiple
tissues and organs on human and mouse.
This predictive power brings promise to reveal the complexity of RNA processing and to
enhance the analysis of single-cell transcription activities.
Five-prime single-cell RNA-seq (scRNA-seq) has been widely employed to profile cellular
transcriptomes, however, its power of analysing transcription start sites (TSS) has not been fully utilised.
In the second project, we present a computational method suite, CamoTSS, to precisely identify TSS
and quantify its expression by leveraging the cDNA on read 1, which enables effective detection of alternative TSS usage.
With various experimental data sets, we have demonstrated that CamoTSS can accurately identify TSS and the detected alternative TSS usages showed strong specificity in different biological processes, including cell types across human organs, the development of human thymus, and cancer conditions. As evidenced in nasopharyngeal cancer, alternative TSS usage can also reveal regulatory patterns including systematic TSS dysregulations.
Three-prime single-cell RNA-seq (scRNA-seq) has been widely employed to dissect the variability of cellular transcriptomes, while only the cDNAs on reads 2 are routinely used, including to analyze polyadenylation sites (PAS). However, despite of high sequencing noise, we found the cDNAs on reads 1 are highly informative in precisely detecting PAS. In the third project, we further develop a computational method, scTail,
to identify PAS using reads 1 and quantify its expression leveraging reads 2, which enables effective detection of alternative PAS usage (PAU). When compared with other methods, scTail detects PAS more sensitively and precisely. With various experimental
data sets, we demonstrated that scTail can discover differential alternative PAS usage in
various biological processes including cell types in human intestinal, disease status of
esophageal squamous cell carcinoma, and time point of mouse forelimb histogenesis,
revealing critical insights in PAS regulations. |
| Degree | Doctor of Philosophy |
| Subject | Nucleotide sequence Machine learning |
| Dept/Program | Biomedical Sciences |
| Persistent Identifier | http://hdl.handle.net/10722/364009 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Hou, Ruiyan | - |
| dc.contributor.author | 厚蕊燕 | - |
| dc.date.accessioned | 2025-10-20T02:56:31Z | - |
| dc.date.available | 2025-10-20T02:56:31Z | - |
| dc.date.issued | 2025 | - |
| dc.identifier.citation | Hou, R. [厚蕊燕]. (2025). Computational methods for regulatory analysis of transcription process in single cells. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
| dc.identifier.uri | http://hdl.handle.net/10722/364009 | - |
| dc.description.abstract | The central dogma of molecular biology explains the flow of genetic information within a biological system. The process by which "DNA produces RNA" is complex. DNA is transcribed into precursor messenger RNA (pre-mRNA), which undergoes processing to form mature RNA. The processing includes capping, splicing, and 3’-end cleavage followed by polyadenylation. After the RNAs finished their function, they would degraded. I did three projects relevant with the whole process and developed computational tools to help people in wet lab to explore the biology mechanism. The RNA splicing efficiency is of high interest for both understanding the regulatory machinery of gene expression and estimating the RNA velocity in single cells. However, its genomic regulation and stochasticity across contexts remain poorly understood. In the first project, by leveraging the recent RNA velocity tool, we estimated the relative splicing efficiency across a variety of single-cell RNA-Seq data sets. We further extracted large sets of genomic features and 120 RNA-binding protein features and found they are highly predictive to relative RNA splicing efficiency across multiple tissues and organs on human and mouse. This predictive power brings promise to reveal the complexity of RNA processing and to enhance the analysis of single-cell transcription activities. Five-prime single-cell RNA-seq (scRNA-seq) has been widely employed to profile cellular transcriptomes, however, its power of analysing transcription start sites (TSS) has not been fully utilised. In the second project, we present a computational method suite, CamoTSS, to precisely identify TSS and quantify its expression by leveraging the cDNA on read 1, which enables effective detection of alternative TSS usage. With various experimental data sets, we have demonstrated that CamoTSS can accurately identify TSS and the detected alternative TSS usages showed strong specificity in different biological processes, including cell types across human organs, the development of human thymus, and cancer conditions. As evidenced in nasopharyngeal cancer, alternative TSS usage can also reveal regulatory patterns including systematic TSS dysregulations. Three-prime single-cell RNA-seq (scRNA-seq) has been widely employed to dissect the variability of cellular transcriptomes, while only the cDNAs on reads 2 are routinely used, including to analyze polyadenylation sites (PAS). However, despite of high sequencing noise, we found the cDNAs on reads 1 are highly informative in precisely detecting PAS. In the third project, we further develop a computational method, scTail, to identify PAS using reads 1 and quantify its expression leveraging reads 2, which enables effective detection of alternative PAS usage (PAU). When compared with other methods, scTail detects PAS more sensitively and precisely. With various experimental data sets, we demonstrated that scTail can discover differential alternative PAS usage in various biological processes including cell types in human intestinal, disease status of esophageal squamous cell carcinoma, and time point of mouse forelimb histogenesis, revealing critical insights in PAS regulations. | en |
| dc.language | eng | - |
| dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
| dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
| dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.subject.lcsh | Nucleotide sequence | - |
| dc.subject.lcsh | Machine learning | - |
| dc.title | Computational methods for regulatory analysis of transcription process in single cells | - |
| dc.type | PG_Thesis | - |
| dc.description.thesisname | Doctor of Philosophy | - |
| dc.description.thesislevel | Doctoral | - |
| dc.description.thesisdiscipline | Biomedical Sciences | - |
| dc.description.nature | published_or_final_version | - |
| dc.date.hkucongregation | 2025 | - |
| dc.identifier.mmsid | 991045117252103414 | - |
