File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Predicting target genes of non-coding regulatory variants with IRT

TitlePredicting target genes of non-coding regulatory variants with IRT
Authors
Issue Date2020
Citation
Bioinformatics, 2020, v. 36, n. 16, p. 4440-4448 How to Cite?
AbstractInterpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies.
Persistent Identifierhttp://hdl.handle.net/10722/354167
ISSN
2023 Impact Factor: 4.4
2023 SCImago Journal Rankings: 2.574
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorWu, Zhenqin-
dc.contributor.authorIoannidis, Nilah M.-
dc.contributor.authorZou, James-
dc.date.accessioned2025-02-07T08:46:54Z-
dc.date.available2025-02-07T08:46:54Z-
dc.date.issued2020-
dc.identifier.citationBioinformatics, 2020, v. 36, n. 16, p. 4440-4448-
dc.identifier.issn1367-4803-
dc.identifier.urihttp://hdl.handle.net/10722/354167-
dc.description.abstractInterpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies.-
dc.languageeng-
dc.relation.ispartofBioinformatics-
dc.titlePredicting target genes of non-coding regulatory variants with IRT-
dc.typeArticle-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1093/bioinformatics/btaa254-
dc.identifier.pmid32330225-
dc.identifier.scopuseid_2-s2.0-85094222280-
dc.identifier.volume36-
dc.identifier.issue16-
dc.identifier.spage4440-
dc.identifier.epage4448-
dc.identifier.eissn1460-2059-
dc.identifier.isiWOS:000606794200008-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats