File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest

TitleDiscovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest
Authors
KeywordsDNA motif
Cell-type specificity
Transcription factor
Random Forest
Cis-regulatory element
Issue Date2018
Citation
BMC Genomics, 2018, v. 19, article no. 929 How to Cite?
Abstract© 2018 The Author(s). Background: It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs - a motif grammar - located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. Results: We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Conclusions: Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific.
Persistent Identifierhttp://hdl.handle.net/10722/262777
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorWang, Xin-
dc.contributor.authorLin, Peijie-
dc.contributor.authorHo, Joshua W.K.-
dc.date.accessioned2018-10-08T02:47:00Z-
dc.date.available2018-10-08T02:47:00Z-
dc.date.issued2018-
dc.identifier.citationBMC Genomics, 2018, v. 19, article no. 929-
dc.identifier.urihttp://hdl.handle.net/10722/262777-
dc.description.abstract© 2018 The Author(s). Background: It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs - a motif grammar - located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. Results: We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Conclusions: Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific.-
dc.languageeng-
dc.relation.ispartofBMC Genomics-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectDNA motif-
dc.subjectCell-type specificity-
dc.subjectTranscription factor-
dc.subjectRandom Forest-
dc.subjectCis-regulatory element-
dc.titleDiscovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest-
dc.typeArticle-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.1186/s12864-017-4340-z-
dc.identifier.pmid29363433-
dc.identifier.scopuseid_2-s2.0-85040710852-
dc.identifier.volume19-
dc.identifier.spagearticle no. 929-
dc.identifier.epagearticle no. 929-
dc.identifier.eissn1471-2164-
dc.identifier.isiWOS:000422886100015-
dc.identifier.issnl1471-2164-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats