Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models

Li, W; Miao, W; Cui, J; Fang, C; Su, S; Li, H; Hu, LH; Lu, Y; Chen, G

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1021/acs.jcim.8b00878
Scopus: eid_2-s2.0-85064854778
PMID: 30912940
WOS: WOS:000469884900016
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Chemistry: Journal/Magazine Articles

Article: Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models

Title	Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models
Authors	Li, W Miao, W Cui, J Fang, C Su, S Li, H Hu, LH Lu, Y Chen, G
Issue Date	2019
Publisher	American Chemical Society. The Journal's web site is located at http://pubs.acs.org/jcics
Citation	Journal of Chemical Information and Modeling, 2019, v. 59 n. 5, p. 1849-1857 How to Cite? DOI: http://dx.doi.org/10.1021/acs.jcim.8b00878
Abstract	Machine learning has exhibited powerful capabilities in many areas. However, machine learning models are mostly database dependent, requiring a new model if the database changes. Therefore, a universal model is highly desired to accommodate the widest variety of databases. Fortunately, this universality may be achieved by ensemble learning, which can integrate multiple learners to meet the demands of diversified databases. Therefore, we propose a general procedure for learning ensemble establishment based on noncovalent interactions (NCIs) databases. Additionally, accurate NCI computation is quite demanding for first-principles methods, for which a competent machine learning model can be an efficient solution to obtain high NCI accuracy with minimal computational resources. In regard to these aspects, multiple schemes of ensemble learning models (Bagging, Boosting, and Stacking frameworks), are explored in this study. The models are based on various low levels of density functional theory (DFT) calculations for the benchmark databases S66, S22, and X40. All NCIs computed by the DFT calculations can be improved to high-level accuracy (root-mean-square error RMSE = 0.22 kcal/mol in contrast to CCSD(T)/CBS benchmark) by established ensemble learning models. Compared with single machine learning models, ensemble models show better accuracy (RMSE of the best model is further lowered by ∼25%), robustness and goodness-of-fit according to evaluation parameters suggested by the OECD. Among ensemble learning models, heterogeneous Stacking ensemble models show the most valuable application potential. The standardized procedure of constructing learning ensembles has been well utilized on several NCI data sets, and this procedure may also be applicable for other chemical databases. © 2019 American Chemical Society.
Persistent Identifier	http://hdl.handle.net/10722/279298
ISSN	1549-9596 2021 Impact Factor: 6.162 2020 SCImago Journal Rankings: 1.240
ISI Accession Number ID	WOS:000469884900016

DC Field	Value	Language
dc.contributor.author	Li, W	-
dc.contributor.author	Miao, W	-
dc.contributor.author	Cui, J	-
dc.contributor.author	Fang, C	-
dc.contributor.author	Su, S	-
dc.contributor.author	Li, H	-
dc.contributor.author	Hu, LH	-
dc.contributor.author	Lu, Y	-
dc.contributor.author	Chen, G	-
dc.date.accessioned	2019-10-25T13:53:01Z	-
dc.date.available	2019-10-25T13:53:01Z	-
dc.date.issued	2019	-
dc.identifier.citation	Journal of Chemical Information and Modeling, 2019, v. 59 n. 5, p. 1849-1857	-
dc.identifier.issn	1549-9596	-
dc.identifier.uri	http://hdl.handle.net/10722/279298	-
dc.description.abstract	Machine learning has exhibited powerful capabilities in many areas. However, machine learning models are mostly database dependent, requiring a new model if the database changes. Therefore, a universal model is highly desired to accommodate the widest variety of databases. Fortunately, this universality may be achieved by ensemble learning, which can integrate multiple learners to meet the demands of diversified databases. Therefore, we propose a general procedure for learning ensemble establishment based on noncovalent interactions (NCIs) databases. Additionally, accurate NCI computation is quite demanding for first-principles methods, for which a competent machine learning model can be an efficient solution to obtain high NCI accuracy with minimal computational resources. In regard to these aspects, multiple schemes of ensemble learning models (Bagging, Boosting, and Stacking frameworks), are explored in this study. The models are based on various low levels of density functional theory (DFT) calculations for the benchmark databases S66, S22, and X40. All NCIs computed by the DFT calculations can be improved to high-level accuracy (root-mean-square error RMSE = 0.22 kcal/mol in contrast to CCSD(T)/CBS benchmark) by established ensemble learning models. Compared with single machine learning models, ensemble models show better accuracy (RMSE of the best model is further lowered by ∼25%), robustness and goodness-of-fit according to evaluation parameters suggested by the OECD. Among ensemble learning models, heterogeneous Stacking ensemble models show the most valuable application potential. The standardized procedure of constructing learning ensembles has been well utilized on several NCI data sets, and this procedure may also be applicable for other chemical databases. © 2019 American Chemical Society.	-
dc.language	eng	-
dc.publisher	American Chemical Society. The Journal's web site is located at http://pubs.acs.org/jcics	-
dc.relation.ispartof	Journal of Chemical Information and Modeling	-
dc.rights	This document is the Accepted Manuscript version of a Published Work that appeared in final form in [JournalTitle], copyright © American Chemical Society after peer review and technical editing by the publisher. To access the final edited and published work see [insert ACS Articles on Request author-directed link to Published Work, see http://pubs.acs.org/page/policy/articlesonrequest/index.html].	-
dc.title	Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models	-
dc.type	Article	-
dc.identifier.email	Chen, G: ghchen@hku.hk	-
dc.identifier.authority	Chen, G=rp00671	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1021/acs.jcim.8b00878	-
dc.identifier.pmid	30912940	-
dc.identifier.scopus	eid_2-s2.0-85064854778	-
dc.identifier.hkuros	308213	-
dc.identifier.volume	59	-
dc.identifier.issue	5	-
dc.identifier.spage	1849	-
dc.identifier.epage	1857	-
dc.identifier.isi	WOS:000469884900016	-
dc.publisher.place	United States	-
dc.identifier.issnl	1549-9596	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats