File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: A data-mining approach for multiple structural alignment of proteins

TitleA data-mining approach for multiple structural alignment of proteins
Authors
KeywordsStructural comparisons
Proteins
Multiple alignment
Issue Date2010
PublisherBiomedical Informatics Publishing Group. The Journal's web site is located at http://www.bioinformation.net/
Citation
Bioinformation, 2010, v. 4 n. 8, p. 366-370 How to Cite?
AbstractComparing the 3D structures of proteins is an important but computationally hard problem in bioinformatics. In this paper, we propose studying the problem when much less information or assumptions are available. We model the structural alignment of proteins as a combinatorial problem. In the problem, each protein is simply a set of points in the 3D space, without sequence order information, and the objective is to discover all large enough alignments for any subset of the input. We propose a data-mining approach for this problem. We first perform geometric hashing of the structures such that points with similar locations in the 3D space are hashed into the same bin in the hash table. The novelty is that we consider each bin as a coincidence group and mine for frequent patterns, which is a well-studied technique in data mining. We observe that these frequent patterns are already potentially large alignments. Then a simple heuristic is used to extend the alignments if possible. We implemented the algorithm and tested it using real protein structures. The results were compared with existing tools. They showed that the algorithm is capable of finding conserved substructures that do not preserve sequence order, especially those existing in protein interfaces. The algorithm can also identify conserved substructures of functionally similar structures within a mixture with dissimilar ones. The running time of the program was smaller or comparable to that of the existing tools.
Persistent Identifierhttp://hdl.handle.net/10722/129981
ISSN
2022 Impact Factor: 1.9
PubMed Central ID

 

DC FieldValueLanguage
dc.contributor.authorSiu, WYen_US
dc.contributor.authorMamoulis, Nen_US
dc.contributor.authorYiu, SMen_US
dc.contributor.authorChan, HLen_US
dc.date.accessioned2010-12-23T08:45:07Z-
dc.date.available2010-12-23T08:45:07Z-
dc.date.issued2010en_US
dc.identifier.citationBioinformation, 2010, v. 4 n. 8, p. 366-370en_US
dc.identifier.issn0973-2063-
dc.identifier.urihttp://hdl.handle.net/10722/129981-
dc.description.abstractComparing the 3D structures of proteins is an important but computationally hard problem in bioinformatics. In this paper, we propose studying the problem when much less information or assumptions are available. We model the structural alignment of proteins as a combinatorial problem. In the problem, each protein is simply a set of points in the 3D space, without sequence order information, and the objective is to discover all large enough alignments for any subset of the input. We propose a data-mining approach for this problem. We first perform geometric hashing of the structures such that points with similar locations in the 3D space are hashed into the same bin in the hash table. The novelty is that we consider each bin as a coincidence group and mine for frequent patterns, which is a well-studied technique in data mining. We observe that these frequent patterns are already potentially large alignments. Then a simple heuristic is used to extend the alignments if possible. We implemented the algorithm and tested it using real protein structures. The results were compared with existing tools. They showed that the algorithm is capable of finding conserved substructures that do not preserve sequence order, especially those existing in protein interfaces. The algorithm can also identify conserved substructures of functionally similar structures within a mixture with dissimilar ones. The running time of the program was smaller or comparable to that of the existing tools.-
dc.languageengen_US
dc.publisherBiomedical Informatics Publishing Group. The Journal's web site is located at http://www.bioinformation.net/-
dc.relation.ispartofBioinformationen_US
dc.subjectStructural comparisons-
dc.subjectProteins-
dc.subjectMultiple alignment-
dc.titleA data-mining approach for multiple structural alignment of proteinsen_US
dc.typeArticleen_US
dc.identifier.openurlhttp://library.hku.hk:4550/resserv?sid=HKU:IR&issn=0973-2063&volume=4&issue=8&spage=366&epage=370&date=2010&atitle=A+data-mining+approach+for+multiple+structural+alignment+of+proteins-
dc.identifier.emailMamoulis, N: nikos@cs.hku.hken_US
dc.identifier.emailYiu, SM: smyiu@cs.hku.hken_US
dc.identifier.emailChan, HL: hlchan@cs.hku.hken_US
dc.identifier.authorityMamoulis, N=rp00155en_US
dc.identifier.authorityYiu, SM=rp00207en_US
dc.identifier.authorityChan, HL=rp01310en_US
dc.description.naturepublished_or_final_version-
dc.identifier.pmid21079664-
dc.identifier.pmcidPMC2951672-
dc.identifier.hkuros177372en_US
dc.identifier.volume4en_US
dc.identifier.issue8en_US
dc.identifier.spage366en_US
dc.identifier.epage370en_US
dc.identifier.issnl0973-2063-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats