Assessment of Quantum and Machine Learning Methods for Molecular Property Predictions

Tahchieva, Diana Nikolaeva. Assessment of Quantum and Machine Learning Methods for Molecular Property Predictions. 2020, Doctoral Thesis, University of Basel, Faculty of Science.


Official URL: https://edoc.unibas.ch/78541/

Downloads: Statistics Overview


Modern computational quantum chemistry methods have become an invaluable tool in chemistry, to help direct compound and materials design as well as provide atomistic detail on interesting chemical phenomena. The utility of these methods rests on the fact that some can reach “chemical accuracy” of sub 1 kcal/mol error with respect to experimentally observed values. Unfortunately, most of these highly-accurate methods suffer from exponentially increasing computational cost with system size and are therefore limited to relatively small systems. Consequently, varying amounts of approximation must be applied to extend quantum mechanical treatments of atoms and molecules to larger systems. The reliability of these approximations, however, is often significantly dependent upon the property being calculated as well as the subset of chemical compound space within which they are designed to be used. Indeed many of these methods produce excellent results when tested on one part of chemical space but entirely fail in another. Therefore, a thorough and rigorous assessment of these methods is an intrinsically crucial task. Such an assessment, however, requires the availability of a diverse and highly accurate benchmark set with minimum bias in data selection. Although a large amount of high-quality reference data has already been produced through the years, much of this data explores a relatively narrow scope of chemical space, and as research interest shifts towards more exotic chemical species, the available quantum chemistry data becomes more sparse. This thesis focuses on the design of more diverse and less biased chemical data sets, which are then used to derive insight on structure-property relationships as well as to improve the accuracy and scope of standard computational chemistry methods. In the first part of the thesis, the torsional potential energy surfaces of a set of halogenated thiocarbonyl derivatives were investigated. The richness of the data set with respect to halogen diversity helped reveal a clear correlation between the shape of the torsional profile and the halogen type(s) contained in the query molecule. However, a rather worrisome observation was made on the performance of some of the most popular quantum chemistry methods for the prediction of these torsional potential energy surfaces. A comparison of Hartree-Fock (HF) and sixteen density functional theory (DFT) approximations with reference CCSD(T) results revealed that the majority of the methods predict qualitatively and quantitatively incorrect torsional profiles for molecules containing at least one heavy halogen. It was further determined that the presence of ∼50% exact exchange in the DFT methods is a crucial ingredient for the appropriate description of torsional profiles. Moreover, a new torsion-corrected atom centered potential (TCACP) was proposed as a remedy for the method performance for DFT applications in a plane wave basis. The second part of this thesis presents an automatized multi-reference study of the singlet-triplet energy splittings of eight thousand machine-generated carbenes, encompassing a large carbene chemical space with wide spin gap ranges. Analysis of the carbene compositional and electronic structure determined the presence of strong hyperconjugation across tetravalent carbon. Furthermore, a remarkable universal upper limit of the vertical spin gap was established, and it was further verified by a detailed derivation based on the underlying physics of the electronic structure of this carbene chemical space. The richness of this chemical data set was subsequently used in the third part of this thesis, where the interplay between quantum chemistry and machine learning methods was investigated for the prediction of spin energy splittings. At first, the performance of popular methods was assessed by comparing their results to a high-order multi-reference level of theory (MRCISD+Q). It was demonstrated that all methods but the state-averaged complete active space self-consistent field (SA-CASSCF) method are unreliable for the prediction of singlet-triplet energy gaps of triplet state carbenes. Thereafter, different combinations of quantum chemistry and machine learning methods were compared as possible strategies for the screening of carbene chemical space. Subsequently, the strategy that offered the best compromise between computational efficiency and accuracy was used to predict approximately one hundred thousand singlet-triplet energy splittings. While obtaining accurate multi-reference energies induces a significant computational overhead, we show that a suitable quantum machine learning strategy offers the perspective of compound exploration across a vast chemical space.
Advisors:von Lilienfeld, Anatole and Meuwly, Markus
Faculties and Departments:05 Faculty of Science > Departement Chemie > Former Organization Units Chemistry > Physikalische Chemie (Lilienfeld)
UniBasel Contributors:von Lilienfeld, Anatole and Meuwly, Markus
Item Type:Thesis
Thesis Subtype:Doctoral Thesis
Thesis no:13764
Thesis status:Complete
Number of Pages:xi, 145
Identification Number:
  • urn: urn:nbn:ch:bel-bau-diss137643
edoc DOI:
Last Modified:01 Aug 2021 01:30
Deposited On:27 Jan 2021 16:03

Repository Staff Only: item control page