Quantum machine learning in chemical space

Faber, Felix Andreas. Quantum machine learning in chemical space. 2019, Doctoral Thesis, University of Basel, Faculty of Science.


Official URL: http://edoc.unibas.ch/diss/DissB_13439

Downloads: Statistics Overview


This thesis focus on the overlap of first principle quantum methods and machine learning in computational chemistry and materials science, commonly referred to as Quantum Machine Learning (QML).
Assessing and benchmarking the performance of existing machine learning models on various classes of compounds and chemical properties is a substantial part of this thesis. These results are used to understand better which machine learning models are best suited for a given combination of properties and compounds. For example, thirteen electronic ground state properties of $\sim$131k organic molecules, calculated at hybrid-DFT level of theory, were used to gauge the predictive accuracy of combinations of representations and regressors. The out-of-sample prediction errors of the models on the hybrid-DFT quality data are on par with, or close to, the CCSD(T) error to experimental values, indicating that reference data need to go beyond hybrid-DFT if QML predictions are to surpass chemical accuracies.
Another area of focus is on developing new and accurate QML models. A new representation of atoms in its chemical environment is introduced, by rethinking the way structural and chemical compound information is encoded into training data. The representation interpolates elemental properties across both atoms and compounds, making it well suited for datasets with high compositional and structural degrees of freedom. Numerical results evidence that, compared to current benchmarks, this representation yield superior predictive power in combination with kernel ridge regression on a diverse set of systems, including diverse organic molecules, non-covalently bonded protein side-chains, water clusters, and crystalline solids. Furthermore, the role of response operators when learning response properties of the energy is discussed, leading to a formalism for learning response properties of the energy by applying the corresponding response operator directly to the quantum machine learning model. Using this formalism leads to train QML models results in lower out-of-sample errors than learning the corresponding properties directly. The formalism can also be used to reproduce accurate normal modes and IR-spectra in molecules.
Finally, the applicability of QML models is explored. A machine learning model which encodes the elemental identities of the atoms placed in each site, to exhaustively screen the formation energy of $\sim$2 milion Elpasolite crystals. The resulting model's accuracy improves systematically with additional training data, reaching an accuracy of 0.1 eV/atom when trained on 10k crystals. Out of the $\sim$2 million crystals, we identify 90 unique structures which span the convex hull of stability, among which NFAl$_2$Ca$_6$, with uncommon stoichiometry and a negative atomic oxidation state for Al.
Advisors:Lilienfeld, O. Anatole <<von>> and Goedecker, Stefan
Faculties and Departments:05 Faculty of Science > Departement Chemie > Former Organization Units Chemistry > Physikalische Chemie (Lilienfeld)
UniBasel Contributors:Goedecker, Stefan
Item Type:Thesis
Thesis Subtype:Doctoral Thesis
Thesis no:13439
Thesis status:Complete
Number of Pages:1 Online-Ressource (x, 151 Seiten)
Identification Number:
edoc DOI:
Last Modified:11 Dec 2019 05:30
Deposited On:10 Dec 2019 15:26

Repository Staff Only: item control page