File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Phylogenetic tree reconstruction with protein linkage
Title | Phylogenetic tree reconstruction with protein linkage |
---|---|
Authors | |
Advisors | Advisor(s):Chin, FYL |
Issue Date | 2012 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Yu, J. [于俊杰]. (2012). Phylogenetic tree reconstruction with protein linkage. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4961816 |
Abstract | Phylogenetic tree reconstruction for a set of species is an important problem for understanding the evolutionary history of the species. Existing algorithms usually represent each species as a binary string with each bit indicating whether a particular gene/protein exists in the species. Given the topology of a phylogenetic tree with each leaf representing a species (a binary string of equal length) and each internal node representing the hypothetical ancestor, the Fitch-Hartigan algorithm and the Sankoff algorithm are two polynomial-time algorithms which assign binary strings to internal nodes such that the total Hamming distance between adjacent nodes in the tree is minimized. However, these algorithms oversimplify the evolutionary process by considering only the number of protein insertions/deletions (Hamming distance) between two species and by assuming the evolutionary history of each protein is independent.
Since the function of a protein may depend on the existence of other proteins, the evolutionary history of these functionally dependent proteins should be similar, i.e. functionally dependent proteins should usually be present (or absent) in a species at the same time. Thus, in addition to the Hamming distance, the protein linkage distance for some pairs/sets of proteins: whole block linkage distance, partial block linkage distance, pairwise linkage distance is introduced. It is proved that the phylogenetic tree reconstruction problem to find the binary strings for the internal nodes of a phylogenetic tree that minimizes the sum of the Hamming distance and the linkage distance is NP-hard.
In this thesis, a general algorithm to solve the phylogenetic tree reconstruction with protein linkage problem which runs in O(4^m⋅n) time for whole/partial block linkage distance and O(4^m⋅⋅ (m+n)) time for pairwise linkage distance (compared to the straight-forward O(4^m⋅ m⋅ n) or O(4^m⋅ m^2⋅⋅ n) time algorithm) is introduced where n is the number of species and m is the length of the binary string (number of proteins). It is further shown, by experiments, that our algorithm using linkage information can construct more accurate trees (better matches with the trees constructed by biologists) than the algorithms using only Hamming distance. |
Degree | Master of Philosophy |
Subject | Phylogeny. Combinatorial analysis. |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/181488 |
HKU Library Item ID | b4961816 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Chin, FYL | - |
dc.contributor.author | Yu, Junjie. | - |
dc.contributor.author | 于俊杰. | - |
dc.date.accessioned | 2013-03-03T03:20:04Z | - |
dc.date.available | 2013-03-03T03:20:04Z | - |
dc.date.issued | 2012 | - |
dc.identifier.citation | Yu, J. [于俊杰]. (2012). Phylogenetic tree reconstruction with protein linkage. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4961816 | - |
dc.identifier.uri | http://hdl.handle.net/10722/181488 | - |
dc.description.abstract | Phylogenetic tree reconstruction for a set of species is an important problem for understanding the evolutionary history of the species. Existing algorithms usually represent each species as a binary string with each bit indicating whether a particular gene/protein exists in the species. Given the topology of a phylogenetic tree with each leaf representing a species (a binary string of equal length) and each internal node representing the hypothetical ancestor, the Fitch-Hartigan algorithm and the Sankoff algorithm are two polynomial-time algorithms which assign binary strings to internal nodes such that the total Hamming distance between adjacent nodes in the tree is minimized. However, these algorithms oversimplify the evolutionary process by considering only the number of protein insertions/deletions (Hamming distance) between two species and by assuming the evolutionary history of each protein is independent. Since the function of a protein may depend on the existence of other proteins, the evolutionary history of these functionally dependent proteins should be similar, i.e. functionally dependent proteins should usually be present (or absent) in a species at the same time. Thus, in addition to the Hamming distance, the protein linkage distance for some pairs/sets of proteins: whole block linkage distance, partial block linkage distance, pairwise linkage distance is introduced. It is proved that the phylogenetic tree reconstruction problem to find the binary strings for the internal nodes of a phylogenetic tree that minimizes the sum of the Hamming distance and the linkage distance is NP-hard. In this thesis, a general algorithm to solve the phylogenetic tree reconstruction with protein linkage problem which runs in O(4^m⋅n) time for whole/partial block linkage distance and O(4^m⋅⋅ (m+n)) time for pairwise linkage distance (compared to the straight-forward O(4^m⋅ m⋅ n) or O(4^m⋅ m^2⋅⋅ n) time algorithm) is introduced where n is the number of species and m is the length of the binary string (number of proteins). It is further shown, by experiments, that our algorithm using linkage information can construct more accurate trees (better matches with the trees constructed by biologists) than the algorithms using only Hamming distance. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.source.uri | http://hub.hku.hk/bib/B49618167 | - |
dc.subject.lcsh | Phylogeny. | - |
dc.subject.lcsh | Combinatorial analysis. | - |
dc.title | Phylogenetic tree reconstruction with protein linkage | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b4961816 | - |
dc.description.thesisname | Master of Philosophy | - |
dc.description.thesislevel | Master | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_b4961816 | - |
dc.date.hkucongregation | 2013 | - |
dc.identifier.mmsid | 991034142119703414 | - |