File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1186/s12864-017-4340-z
- Scopus: eid_2-s2.0-85040710852
- PMID: 29363433
- WOS: WOS:000422886100015
Supplementary
- Citations:
- Appears in Collections:
Article: Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest
Title | Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest |
---|---|
Authors | |
Keywords | DNA motif Cell-type specificity Transcription factor Random Forest Cis-regulatory element |
Issue Date | 2018 |
Citation | BMC Genomics, 2018, v. 19, article no. 929 How to Cite? |
Abstract | © 2018 The Author(s). Background: It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs - a motif grammar - located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. Results: We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Conclusions: Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific. |
Persistent Identifier | http://hdl.handle.net/10722/262777 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Wang, Xin | - |
dc.contributor.author | Lin, Peijie | - |
dc.contributor.author | Ho, Joshua W.K. | - |
dc.date.accessioned | 2018-10-08T02:47:00Z | - |
dc.date.available | 2018-10-08T02:47:00Z | - |
dc.date.issued | 2018 | - |
dc.identifier.citation | BMC Genomics, 2018, v. 19, article no. 929 | - |
dc.identifier.uri | http://hdl.handle.net/10722/262777 | - |
dc.description.abstract | © 2018 The Author(s). Background: It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs - a motif grammar - located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. Results: We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Conclusions: Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific. | - |
dc.language | eng | - |
dc.relation.ispartof | BMC Genomics | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject | DNA motif | - |
dc.subject | Cell-type specificity | - |
dc.subject | Transcription factor | - |
dc.subject | Random Forest | - |
dc.subject | Cis-regulatory element | - |
dc.title | Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest | - |
dc.type | Article | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.1186/s12864-017-4340-z | - |
dc.identifier.pmid | 29363433 | - |
dc.identifier.scopus | eid_2-s2.0-85040710852 | - |
dc.identifier.volume | 19 | - |
dc.identifier.spage | article no. 929 | - |
dc.identifier.epage | article no. 929 | - |
dc.identifier.eissn | 1471-2164 | - |
dc.identifier.isi | WOS:000422886100015 | - |
dc.identifier.issnl | 1471-2164 | - |