File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models

TitleEfficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models
Authors
Issue Date2019
PublisherAmerican Chemical Society. The Journal's web site is located at http://pubs.acs.org/jcics
Citation
Journal of Chemical Information and Modeling, 2019, v. 59 n. 5, p. 1849-1857 How to Cite?
AbstractMachine learning has exhibited powerful capabilities in many areas. However, machine learning models are mostly database dependent, requiring a new model if the database changes. Therefore, a universal model is highly desired to accommodate the widest variety of databases. Fortunately, this universality may be achieved by ensemble learning, which can integrate multiple learners to meet the demands of diversified databases. Therefore, we propose a general procedure for learning ensemble establishment based on noncovalent interactions (NCIs) databases. Additionally, accurate NCI computation is quite demanding for first-principles methods, for which a competent machine learning model can be an efficient solution to obtain high NCI accuracy with minimal computational resources. In regard to these aspects, multiple schemes of ensemble learning models (Bagging, Boosting, and Stacking frameworks), are explored in this study. The models are based on various low levels of density functional theory (DFT) calculations for the benchmark databases S66, S22, and X40. All NCIs computed by the DFT calculations can be improved to high-level accuracy (root-mean-square error RMSE = 0.22 kcal/mol in contrast to CCSD(T)/CBS benchmark) by established ensemble learning models. Compared with single machine learning models, ensemble models show better accuracy (RMSE of the best model is further lowered by ∼25%), robustness and goodness-of-fit according to evaluation parameters suggested by the OECD. Among ensemble learning models, heterogeneous Stacking ensemble models show the most valuable application potential. The standardized procedure of constructing learning ensembles has been well utilized on several NCI data sets, and this procedure may also be applicable for other chemical databases. © 2019 American Chemical Society.
Persistent Identifierhttp://hdl.handle.net/10722/279298
ISSN
2017 Impact Factor: 3.804
2015 SCImago Journal Rankings: 1.610
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorLi, W-
dc.contributor.authorMiao, W-
dc.contributor.authorCui, J-
dc.contributor.authorFang, C-
dc.contributor.authorSu, S-
dc.contributor.authorLi, H-
dc.contributor.authorHu, LH-
dc.contributor.authorLu, Y-
dc.contributor.authorChen, G-
dc.date.accessioned2019-10-25T13:53:01Z-
dc.date.available2019-10-25T13:53:01Z-
dc.date.issued2019-
dc.identifier.citationJournal of Chemical Information and Modeling, 2019, v. 59 n. 5, p. 1849-1857-
dc.identifier.issn1549-9596-
dc.identifier.urihttp://hdl.handle.net/10722/279298-
dc.description.abstractMachine learning has exhibited powerful capabilities in many areas. However, machine learning models are mostly database dependent, requiring a new model if the database changes. Therefore, a universal model is highly desired to accommodate the widest variety of databases. Fortunately, this universality may be achieved by ensemble learning, which can integrate multiple learners to meet the demands of diversified databases. Therefore, we propose a general procedure for learning ensemble establishment based on noncovalent interactions (NCIs) databases. Additionally, accurate NCI computation is quite demanding for first-principles methods, for which a competent machine learning model can be an efficient solution to obtain high NCI accuracy with minimal computational resources. In regard to these aspects, multiple schemes of ensemble learning models (Bagging, Boosting, and Stacking frameworks), are explored in this study. The models are based on various low levels of density functional theory (DFT) calculations for the benchmark databases S66, S22, and X40. All NCIs computed by the DFT calculations can be improved to high-level accuracy (root-mean-square error RMSE = 0.22 kcal/mol in contrast to CCSD(T)/CBS benchmark) by established ensemble learning models. Compared with single machine learning models, ensemble models show better accuracy (RMSE of the best model is further lowered by ∼25%), robustness and goodness-of-fit according to evaluation parameters suggested by the OECD. Among ensemble learning models, heterogeneous Stacking ensemble models show the most valuable application potential. The standardized procedure of constructing learning ensembles has been well utilized on several NCI data sets, and this procedure may also be applicable for other chemical databases. © 2019 American Chemical Society.-
dc.languageeng-
dc.publisherAmerican Chemical Society. The Journal's web site is located at http://pubs.acs.org/jcics-
dc.relation.ispartofJournal of Chemical Information and Modeling-
dc.rightsThis document is the Accepted Manuscript version of a Published Work that appeared in final form in [JournalTitle], copyright © American Chemical Society after peer review and technical editing by the publisher. To access the final edited and published work see [insert ACS Articles on Request author-directed link to Published Work, see http://pubs.acs.org/page/policy/articlesonrequest/index.html].-
dc.titleEfficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models-
dc.typeArticle-
dc.identifier.emailChen, G: ghchen@hku.hk-
dc.identifier.authorityChen, G=rp00671-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1021/acs.jcim.8b00878-
dc.identifier.pmid30912940-
dc.identifier.scopuseid_2-s2.0-85064854778-
dc.identifier.hkuros308213-
dc.identifier.volume59-
dc.identifier.issue5-
dc.identifier.spage1849-
dc.identifier.epage1857-
dc.identifier.isiWOS:000469884900016-
dc.publisher.placeUnited States-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats