File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1021/acs.jcim.8b00878
- Scopus: eid_2-s2.0-85064854778
- PMID: 30912940
- WOS: WOS:000469884900016
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models
Title | Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models |
---|---|
Authors | |
Issue Date | 2019 |
Publisher | American Chemical Society. The Journal's web site is located at http://pubs.acs.org/jcics |
Citation | Journal of Chemical Information and Modeling, 2019, v. 59 n. 5, p. 1849-1857 How to Cite? |
Abstract | Machine learning has exhibited powerful capabilities in many areas. However, machine learning models are mostly database dependent, requiring a new model if the database changes. Therefore, a universal model is highly desired to accommodate the widest variety of databases. Fortunately, this universality may be achieved by ensemble learning, which can integrate multiple learners to meet the demands of diversified databases. Therefore, we propose a general procedure for learning ensemble establishment based on noncovalent interactions (NCIs) databases. Additionally, accurate NCI computation is quite demanding for first-principles methods, for which a competent machine learning model can be an efficient solution to obtain high NCI accuracy with minimal computational resources. In regard to these aspects, multiple schemes of ensemble learning models (Bagging, Boosting, and Stacking frameworks), are explored in this study. The models are based on various low levels of density functional theory (DFT) calculations for the benchmark databases S66, S22, and X40. All NCIs computed by the DFT calculations can be improved to high-level accuracy (root-mean-square error RMSE = 0.22 kcal/mol in contrast to CCSD(T)/CBS benchmark) by established ensemble learning models. Compared with single machine learning models, ensemble models show better accuracy (RMSE of the best model is further lowered by ∼25%), robustness and goodness-of-fit according to evaluation parameters suggested by the OECD. Among ensemble learning models, heterogeneous Stacking ensemble models show the most valuable application potential. The standardized procedure of constructing learning ensembles has been well utilized on several NCI data sets, and this procedure may also be applicable for other chemical databases. © 2019 American Chemical Society. |
Persistent Identifier | http://hdl.handle.net/10722/279298 |
ISSN | 2023 Impact Factor: 5.6 2023 SCImago Journal Rankings: 1.396 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Li, W | - |
dc.contributor.author | Miao, W | - |
dc.contributor.author | Cui, J | - |
dc.contributor.author | Fang, C | - |
dc.contributor.author | Su, S | - |
dc.contributor.author | Li, H | - |
dc.contributor.author | Hu, LH | - |
dc.contributor.author | Lu, Y | - |
dc.contributor.author | Chen, G | - |
dc.date.accessioned | 2019-10-25T13:53:01Z | - |
dc.date.available | 2019-10-25T13:53:01Z | - |
dc.date.issued | 2019 | - |
dc.identifier.citation | Journal of Chemical Information and Modeling, 2019, v. 59 n. 5, p. 1849-1857 | - |
dc.identifier.issn | 1549-9596 | - |
dc.identifier.uri | http://hdl.handle.net/10722/279298 | - |
dc.description.abstract | Machine learning has exhibited powerful capabilities in many areas. However, machine learning models are mostly database dependent, requiring a new model if the database changes. Therefore, a universal model is highly desired to accommodate the widest variety of databases. Fortunately, this universality may be achieved by ensemble learning, which can integrate multiple learners to meet the demands of diversified databases. Therefore, we propose a general procedure for learning ensemble establishment based on noncovalent interactions (NCIs) databases. Additionally, accurate NCI computation is quite demanding for first-principles methods, for which a competent machine learning model can be an efficient solution to obtain high NCI accuracy with minimal computational resources. In regard to these aspects, multiple schemes of ensemble learning models (Bagging, Boosting, and Stacking frameworks), are explored in this study. The models are based on various low levels of density functional theory (DFT) calculations for the benchmark databases S66, S22, and X40. All NCIs computed by the DFT calculations can be improved to high-level accuracy (root-mean-square error RMSE = 0.22 kcal/mol in contrast to CCSD(T)/CBS benchmark) by established ensemble learning models. Compared with single machine learning models, ensemble models show better accuracy (RMSE of the best model is further lowered by ∼25%), robustness and goodness-of-fit according to evaluation parameters suggested by the OECD. Among ensemble learning models, heterogeneous Stacking ensemble models show the most valuable application potential. The standardized procedure of constructing learning ensembles has been well utilized on several NCI data sets, and this procedure may also be applicable for other chemical databases. © 2019 American Chemical Society. | - |
dc.language | eng | - |
dc.publisher | American Chemical Society. The Journal's web site is located at http://pubs.acs.org/jcics | - |
dc.relation.ispartof | Journal of Chemical Information and Modeling | - |
dc.rights | This document is the Accepted Manuscript version of a Published Work that appeared in final form in [JournalTitle], copyright © American Chemical Society after peer review and technical editing by the publisher. To access the final edited and published work see [insert ACS Articles on Request author-directed link to Published Work, see http://pubs.acs.org/page/policy/articlesonrequest/index.html]. | - |
dc.title | Efficient Corrections for DFT Noncovalent Interactions Based on Ensemble Learning Models | - |
dc.type | Article | - |
dc.identifier.email | Chen, G: ghchen@hku.hk | - |
dc.identifier.authority | Chen, G=rp00671 | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1021/acs.jcim.8b00878 | - |
dc.identifier.pmid | 30912940 | - |
dc.identifier.scopus | eid_2-s2.0-85064854778 | - |
dc.identifier.hkuros | 308213 | - |
dc.identifier.volume | 59 | - |
dc.identifier.issue | 5 | - |
dc.identifier.spage | 1849 | - |
dc.identifier.epage | 1857 | - |
dc.identifier.isi | WOS:000469884900016 | - |
dc.publisher.place | United States | - |
dc.identifier.issnl | 1549-9596 | - |