edoc

Embedding-based alignment: combining protein language models and alignment approaches to detect structural similarities in the twilight-zone

Pantolini, Lorenzo and Studer, Gabriel and Pereira, Joana and Durairaj, Janani and Schwede, Torsten . (2022) Embedding-based alignment: combining protein language models and alignment approaches to detect structural similarities in the twilight-zone.

[img] PDF - Submitted Version
Available under License CC BY (Attribution).

3276Kb

Official URL: https://edoc.unibas.ch/94399/

Downloads: Statistics Overview

Abstract

Language models are now routinely used for text classification and generative tasks. Recently, the same architectures were applied to protein sequences, unlocking powerful tools in the bioinformatics field. Protein language models (pLMs) generate high dimensional embeddings on a per-residue level and encode the "semantic meaning" of each individual amino acid in the context of the full protein sequence. Multiple works use these representations as a starting point for downstream learning tasks and, more recently, for identifying distant homologous relationships between proteins. In this work, we introduce a new method that generates embedding-based protein sequence alignments (EBA), and show how these capture structural similarities even in the twilight zone, outperforming both classical sequence-based scores and other approaches based on protein language models. The method shows excellent accuracy despite the absence of training and parameter optimization. We expect that the association of pLMs and alignment methods will soon rise in popularity, helping the detection of relationships between proteins in the twilight-zone.
Faculties and Departments:05 Faculty of Science > Departement Biozentrum > Computational & Systems Biology > Bioinformatics (Schwede)
UniBasel Contributors:Pantolini, Lorenzo and Studer, Gabriel and Durairaj, Janani and Schwede, Torsten
Item Type:Preprint
Publisher:Cold Spring Harbor Laboratory
Number of Pages:9
Note:Publication type according to Uni Basel Research Database: Discussion paper / Internet publication
Language:English
Identification Number:
edoc DOI:
Last Modified:14 Jun 2023 12:54
Deposited On:26 Apr 2023 07:33

Repository Staff Only: item control page