edoc

Inferring chemistry from data with atomistic machine learning: applications to potential energy surfaces and chemical space

Vazquez Salazar, Luis Itza. Inferring chemistry from data with atomistic machine learning: applications to potential energy surfaces and chemical space. 2024, Doctoral Thesis, University of Basel, Faculty of Science.

[img]
Preview
PDF
Available under License CC BY-NC (Attribution-NonCommercial).

13Mb

Official URL: https://edoc.unibas.ch/96402/

Downloads: Statistics Overview

Abstract

The influence of machine learning (ML) in chemistry is undeniable, and it is a powerful tool to obtain chemical insights from large amounts of data. In particular, ML is a perfect tool for exploring chemical space because it allows to obtain good results in a relatively short time. The quality of the results obtained with an ML model highly depends on the data used to train it. After introducing fundamental concepts in Chapters 1 and 2, Chapter 3 deals with the effect of training data on predicting a chemical property. Results show that adequate predictions require a large chemical diversity in the training set. This can be obtained by either using many chemical motives or employing an adequate number of conformers. Once the effect of the data is clear, the next aspect evaluated is the confidence in the predictions obtained with ML models. To this end, two uncertainty quantification strategies based on Bayesian statistics were implemented. The insights into the interplay between error, uncertainty and chemistry provide us with an essential understanding of how a chemical database can be constructed. The previous chapters deal with the use of data obtained from ab-initio calculations. Nevertheless, it is expected that a model can reproduce experimental results. Chapter 5 deals with improving a potential energy surface (PES) based on experimental results by employing a procedure called morphing. Continuing with the study of PES, Chapter 6 uses one of the models introduced in Chapter 3 to study a reactive process. In this case, the performance of detecting outliers through uncertainty quantification was evaluated and compared with the other two strategies. Finally, Chapter 7 plays with adding samples from the conformational space represented by a PES to chemical databases biased towards a chemical insight. The last chapter summarizes the different aspects of the relationships between data and chemistry for exploring chemical space or working with PES. Also, it provides insights into future extensions of the projects presented here.
Advisors:Meuwly, Markus
Committee Members:Lill, Markus A. and Tkatchenko, Alexandre
Faculties and Departments:05 Faculty of Science > Departement Chemie > Chemie > Physikalische Chemie (Meuwly)
UniBasel Contributors:Meuwly, Markus and Lill, Markus A.
Item Type:Thesis
Thesis Subtype:Doctoral Thesis
Thesis no:15389
Thesis status:Complete
Number of Pages:VI, 228, 4
Language:English
Identification Number:
  • urn: urn:nbn:ch:bel-bau-diss153896
edoc DOI:
Last Modified:08 Aug 2024 04:30
Deposited On:07 Aug 2024 11:46

Repository Staff Only: item control page