MoleculeNet: A benchmark for molecular machine learning

Wu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N.; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S.; Leswing, Karl; Pande, Vijay

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1039/c7sc02664a
Scopus: eid_2-s2.0-85040102038
WOS: WOS:000419350700030
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: MoleculeNet: A benchmark for molecular machine learning

Title	MoleculeNet: A benchmark for molecular machine learning
Authors	Wu, Zhenqin Ramsundar, Bharath Feinberg, Evan N.Gomes, Joseph Geniesse, Caleb Pappu, Aneesh S.Leswing, Karl Pande, Vijay
Issue Date	2018
Citation	Chemical Science, 2018, v. 9, n. 2, p. 513-530 How to Cite? DOI: http://dx.doi.org/10.1039/c7sc02664a
Abstract	Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.
Persistent Identifier	http://hdl.handle.net/10722/354386
ISSN	2041-6520 2023 Impact Factor: 7.6 2023 SCImago Journal Rankings: 2.333
ISI Accession Number ID	WOS:000419350700030

DC Field	Value	Language
dc.contributor.author	Wu, Zhenqin	-
dc.contributor.author	Ramsundar, Bharath	-
dc.contributor.author	Feinberg, Evan N.	-
dc.contributor.author	Gomes, Joseph	-
dc.contributor.author	Geniesse, Caleb	-
dc.contributor.author	Pappu, Aneesh S.	-
dc.contributor.author	Leswing, Karl	-
dc.contributor.author	Pande, Vijay	-
dc.date.accessioned	2025-02-07T08:48:17Z	-
dc.date.available	2025-02-07T08:48:17Z	-
dc.date.issued	2018	-
dc.identifier.citation	Chemical Science, 2018, v. 9, n. 2, p. 513-530	-
dc.identifier.issn	2041-6520	-
dc.identifier.uri	http://hdl.handle.net/10722/354386	-
dc.description.abstract	Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.	-
dc.language	eng	-
dc.relation.ispartof	Chemical Science	-
dc.title	MoleculeNet: A benchmark for molecular machine learning	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1039/c7sc02664a	-
dc.identifier.scopus	eid_2-s2.0-85040102038	-
dc.identifier.volume	9	-
dc.identifier.issue	2	-
dc.identifier.spage	513	-
dc.identifier.epage	530	-
dc.identifier.eissn	2041-6539	-
dc.identifier.isi	WOS:000419350700030	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: MoleculeNet: A benchmark for molecular machine learning

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats