Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest

Wang, Xin; Lin, Peijie; Ho, Joshua W.K.

File Download

Content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1186/s12864-017-4340-z
Scopus: eid_2-s2.0-85040710852
PMID: 29363433
WOS: WOS:000422886100015

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Biomedical Sciences: Journal/Magazine Articles

Article: Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest

Title	Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest
Authors	Wang, Xin Lin, Peijie Ho, Joshua W.K.
Keywords	DNA motif Cell-type specificity Transcription factor Random Forest Cis-regulatory element
Issue Date	2018
Citation	BMC Genomics, 2018, v. 19, article no. 929 How to Cite? DOI: http://dx.doi.org/10.1186/s12864-017-4340-z
Abstract	© 2018 The Author(s). Background: It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs - a motif grammar - located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. Results: We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Conclusions: Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific.
Persistent Identifier	http://hdl.handle.net/10722/262777
ISI Accession Number ID	WOS:000422886100015

DC Field	Value	Language
dc.contributor.author	Wang, Xin	-
dc.contributor.author	Lin, Peijie	-
dc.contributor.author	Ho, Joshua W.K.	-
dc.date.accessioned	2018-10-08T02:47:00Z	-
dc.date.available	2018-10-08T02:47:00Z	-
dc.date.issued	2018	-
dc.identifier.citation	BMC Genomics, 2018, v. 19, article no. 929	-
dc.identifier.uri	http://hdl.handle.net/10722/262777	-
dc.description.abstract	© 2018 The Author(s). Background: It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs - a motif grammar - located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. Results: We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Conclusions: Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific.	-
dc.language	eng	-
dc.relation.ispartof	BMC Genomics	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject	DNA motif	-
dc.subject	Cell-type specificity	-
dc.subject	Transcription factor	-
dc.subject	Random Forest	-
dc.subject	Cis-regulatory element	-
dc.title	Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest	-
dc.type	Article	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.1186/s12864-017-4340-z	-
dc.identifier.pmid	29363433	-
dc.identifier.scopus	eid_2-s2.0-85040710852	-
dc.identifier.volume	19	-
dc.identifier.spage	article no. 929	-
dc.identifier.epage	article no. 929	-
dc.identifier.eissn	1471-2164	-
dc.identifier.isi	WOS:000422886100015	-
dc.identifier.issnl	1471-2164	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats