edoc

Discriminating physiological from non-physiological interfaces in structures of protein complexes: A community-wide study

Schweke, Hugo and Xu, Qifang and Tauriello, Gerardo and Pantolini, Lorenzo and Schwede, Torsten and Cazals, Frédéric and Lhéritier, Alix and Fernandez-Recio, Juan and Rodríguez-Lumbreras, Luis Angel and Schueler-Furman, Ora and Varga, Julia K. and Jiménez-García, Brian and Réau, Manon F. and Bonvin, Alexandre M. J. J. and Savojardo, Castrense and Martelli, Pier-Luigi and Casadio, Rita and Tubiana, Jérôme and Wolfson, Haim J. and Oliva, Romina and Barradas-Bautista, Didier and Ricciardelli, Tiziana and Cavallo, Luigi and Venclovas, Česlovas and Olechnovič, Kliment and Guerois, Raphael and Andreani, Jessica and Martin, Juliette and Wang, Xiao and Terashi, Genki and Sarkar, Daipayan and Christoffer, Charles and Aderinwale, Tunde and Verburgt, Jacob and Kihara, Daisuke and Marchand, Anthony and Correia, Bruno E. and Duan, Rui and Qiu, Liming and Xu, Xianjin and Zhang, Shuang and Zou, Xiaoqin and Dey, Sucharita and Dunbrack, Roland L. and Levy, Emmanuel D. and Wodak, Shoshana J.. (2023) Discriminating physiological from non-physiological interfaces in structures of protein complexes: A community-wide study. Proteomics, 23 (17). e2200323.

[img] PDF - Accepted Version
Restricted to Repository staff only until 27 June 2024.

406Kb
[img] PDF - Accepted Version
Restricted to Repository staff only until 27 June 2024.

2244Kb

Official URL: https://edoc.unibas.ch/95445/

Downloads: Statistics Overview

Abstract

Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.
Faculties and Departments:05 Faculty of Science > Departement Biozentrum > Computational & Systems Biology > Bioinformatics (Schwede)
UniBasel Contributors:Schwede, Torsten and Tauriello, Gerardo and Pantolini, Lorenzo
Item Type:Article, refereed
Article Subtype:Research Article
Publisher:Wiley
ISSN:1615-9853
e-ISSN:1615-9861
Note:Publication type according to Uni Basel Research Database: Journal article
Language:English
Language:English
Identification Number:
edoc DOI:
Last Modified:18 Oct 2023 14:58
Deposited On:25 Sep 2023 14:53

Repository Staff Only: item control page