Efficient algorithms in protein modelling

Studer, Gabriel. Efficient algorithms in protein modelling. 2017, Doctoral Thesis, University of Basel, Faculty of Science.

Available under License CC BY-NC (Attribution-NonCommercial).


Official URL: http://edoc.unibas.ch/diss/DissB_12497

Downloads: Statistics Overview


Proteins are key players in the complex world of living cells. No matter whether they are involved in enzymatic reactions, inter-cell communication or numerous other processes, knowledge of their structure is vital for a detailed understanding of their function. However, structure determination by experiment is often a laborious process that cannot keep up with the ever increasing pace of sequencing methodologies. As a consequence, the gap between proteins where we only know the sequence and the proteins where we additionally have detailed structural information is growing rapidly. Computational modelling methods that extrapolate structural information from homologous structures have established themselves as a valuable complement to experiment and help bridging this gap. This thesis addresses two key aspects in protein modelling.
(1) It investigates and improves methodologies that assign reliability estimates to protein models, so called quality estimation (QE) methods. Even a human expert cannot immediately detect errors introduced in the modelling process, thus the importance of automated methods performing this task.
(2) It assesses the available methods that perform the modelling itself, discusses solutions for current shortcomings and provides efficient implementations thereof.
When detecting errors in protein models, many knowledge based methods are biased towards the physio-chemical properties observed in soluble protein structures. This limits their applicability for the important class of membrane protein models. In an effort to improve the situation, QMEANBrane has been developed. QMEANBrane is specifically designed to detect local errors in membrane protein models by membrane specific statistical potentials of mean force that nowadays approach statistical saturation given the increase of available experimental data.
Considering the improvement of quality estimation for soluble proteins, instead of solely applying the widely used statistical potentials of mean force, QMEANDisCo incorporates the observed structural variety of experimentally determined protein structures homologous to the model being assessed. Valuable ensemble information can be gathered without the need of actually depending on a large ensemble of protein models, thus circumventing a main limitation of consensus QE methods.
Apart from improving QE methods, in an effort of implementing and extending state-of-the-art modelling algorithms, the lack of a free and efficient modelling engine became obvious. No available modelling engine provided an open-source codebase as a basis for novel, innovative algorithms and, at the same time, had no restrictions for usage. This contradicts our efforts to make protein modelling available to all biochemists and molecular biologists worldwide. As a consequence we implemented a new free and open modelling engine from scratch - ProMod3. ProMod3 allows to combine extremely efficient, state-of-the-art modelling algorithms in a flexible manner to solve various modelling problems.
To weaken the dogma of one template one model, basic algorithms have been explored to incorporate structural information from multiple templates into one protein model. The algorithms are built using ProMod3 and have extensively been tested in the context of the CAMEO continuous evaluation platform. The result is a highly competitive modelling pipeline that excels with extremely low runtimes and excellent performance.
Advisors:Schwede, Torsten and Maier, Timm
Faculties and Departments:05 Faculty of Science > Departement Biozentrum > Computational & Systems Biology > Bioinformatics (Schwede)
UniBasel Contributors:Schwede, Torsten and Maier, Timm
Item Type:Thesis
Thesis Subtype:Doctoral Thesis
Thesis no:12497
Thesis status:Complete
Number of Pages:1 Online-Ressource (xi, 154 Seiten)
Identification Number:
edoc DOI:
Last Modified:05 Apr 2018 17:36
Deposited On:16 Mar 2018 14:45

Repository Staff Only: item control page