Genetic Optimization of Training Sets for Improved Machine Learning Models of Molecular Properties

Browning, Nicholas J. and Ramakrishnan, Raghunathan and von Lilienfeld, O. Anatole and Roethlisberger, Ursula. (2017) Genetic Optimization of Training Sets for Improved Machine Learning Models of Molecular Properties. Journal of Physical Chemistry Letters, 8 (7). pp. 1351-1359.

PDF - Accepted Version

Official URL: https://edoc.unibas.ch/63254/

Downloads: Statistics Overview


The training of molecular models of quantum mechanical properties based on statistical machine learning requires large data sets which exemplify the map from chemical structure to molecular property. Intelligent a priori selection of training examples is often difficult or impossible to achieve, as prior knowledge may be unavailable. Ordinarily representative selection of training molecules from such data sets is achieved through random sampling. We use genetic algorithms for the optimization of training set composition consisting of tens of thousands of small organic molecules. The resulting machine learning models are considerably more accurate: in the limit of small training sets, mean absolute errors for out-of-sample predictions are reduced by up to ∼75%. We discuss and present optimized training sets consisting of 10 molecular classes for all molecular properties studied. We show that these classes can be used to design improved training sets for the generation of machine learning models of the same properties in similar but unrelated molecular sets.
Faculties and Departments:05 Faculty of Science > Departement Chemie > Former Organization Units Chemistry > Physikalische Chemie (Lilienfeld)
UniBasel Contributors:von Lilienfeld, Anatole
Item Type:Article, refereed
Article Subtype:Research Article
Publisher:American Chemical Society
Note:Publication type according to Uni Basel Research Database: Journal article
Identification Number:
edoc DOI:
Last Modified:07 Jul 2020 07:48
Deposited On:16 Apr 2018 07:14

Repository Staff Only: item control page