File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Phylogenetic tree reconstruction with protein linkage

TitlePhylogenetic tree reconstruction with protein linkage
Authors
Advisors
Advisor(s):Chin, FYL
Issue Date2012
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Yu, J. [于俊杰]. (2012). Phylogenetic tree reconstruction with protein linkage. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4961816
AbstractPhylogenetic tree reconstruction for a set of species is an important problem for understanding the evolutionary history of the species. Existing algorithms usually represent each species as a binary string with each bit indicating whether a particular gene/protein exists in the species. Given the topology of a phylogenetic tree with each leaf representing a species (a binary string of equal length) and each internal node representing the hypothetical ancestor, the Fitch-Hartigan algorithm and the Sankoff algorithm are two polynomial-time algorithms which assign binary strings to internal nodes such that the total Hamming distance between adjacent nodes in the tree is minimized. However, these algorithms oversimplify the evolutionary process by considering only the number of protein insertions/deletions (Hamming distance) between two species and by assuming the evolutionary history of each protein is independent. Since the function of a protein may depend on the existence of other proteins, the evolutionary history of these functionally dependent proteins should be similar, i.e. functionally dependent proteins should usually be present (or absent) in a species at the same time. Thus, in addition to the Hamming distance, the protein linkage distance for some pairs/sets of proteins: whole block linkage distance, partial block linkage distance, pairwise linkage distance is introduced. It is proved that the phylogenetic tree reconstruction problem to find the binary strings for the internal nodes of a phylogenetic tree that minimizes the sum of the Hamming distance and the linkage distance is NP-hard. In this thesis, a general algorithm to solve the phylogenetic tree reconstruction with protein linkage problem which runs in O(4^m⋅n) time for whole/partial block linkage distance and O(4^m⋅⋅ (m+n)) time for pairwise linkage distance (compared to the straight-forward O(4^m⋅ m⋅ n) or O(4^m⋅ m^2⋅⋅ n) time algorithm) is introduced where n is the number of species and m is the length of the binary string (number of proteins). It is further shown, by experiments, that our algorithm using linkage information can construct more accurate trees (better matches with the trees constructed by biologists) than the algorithms using only Hamming distance.
DegreeMaster of Philosophy
SubjectPhylogeny.
Combinatorial analysis.
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/181488
HKU Library Item IDb4961816

 

DC FieldValueLanguage
dc.contributor.advisorChin, FYL-
dc.contributor.authorYu, Junjie.-
dc.contributor.author于俊杰.-
dc.date.accessioned2013-03-03T03:20:04Z-
dc.date.available2013-03-03T03:20:04Z-
dc.date.issued2012-
dc.identifier.citationYu, J. [于俊杰]. (2012). Phylogenetic tree reconstruction with protein linkage. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b4961816-
dc.identifier.urihttp://hdl.handle.net/10722/181488-
dc.description.abstractPhylogenetic tree reconstruction for a set of species is an important problem for understanding the evolutionary history of the species. Existing algorithms usually represent each species as a binary string with each bit indicating whether a particular gene/protein exists in the species. Given the topology of a phylogenetic tree with each leaf representing a species (a binary string of equal length) and each internal node representing the hypothetical ancestor, the Fitch-Hartigan algorithm and the Sankoff algorithm are two polynomial-time algorithms which assign binary strings to internal nodes such that the total Hamming distance between adjacent nodes in the tree is minimized. However, these algorithms oversimplify the evolutionary process by considering only the number of protein insertions/deletions (Hamming distance) between two species and by assuming the evolutionary history of each protein is independent. Since the function of a protein may depend on the existence of other proteins, the evolutionary history of these functionally dependent proteins should be similar, i.e. functionally dependent proteins should usually be present (or absent) in a species at the same time. Thus, in addition to the Hamming distance, the protein linkage distance for some pairs/sets of proteins: whole block linkage distance, partial block linkage distance, pairwise linkage distance is introduced. It is proved that the phylogenetic tree reconstruction problem to find the binary strings for the internal nodes of a phylogenetic tree that minimizes the sum of the Hamming distance and the linkage distance is NP-hard. In this thesis, a general algorithm to solve the phylogenetic tree reconstruction with protein linkage problem which runs in O(4^m⋅n) time for whole/partial block linkage distance and O(4^m⋅⋅ (m+n)) time for pairwise linkage distance (compared to the straight-forward O(4^m⋅ m⋅ n) or O(4^m⋅ m^2⋅⋅ n) time algorithm) is introduced where n is the number of species and m is the length of the binary string (number of proteins). It is further shown, by experiments, that our algorithm using linkage information can construct more accurate trees (better matches with the trees constructed by biologists) than the algorithms using only Hamming distance.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.source.urihttp://hub.hku.hk/bib/B49618167-
dc.subject.lcshPhylogeny.-
dc.subject.lcshCombinatorial analysis.-
dc.titlePhylogenetic tree reconstruction with protein linkage-
dc.typePG_Thesis-
dc.identifier.hkulb4961816-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b4961816-
dc.date.hkucongregation2013-
dc.identifier.mmsid991034142119703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats