Optimization and historical contingency in protein sequences
ORAL · Invited
Abstract
Correlations arising from phylogeny often confound coevolution signal from functional or structural optimization, impairing the inference of structural contacts from sequences. However, inferred Potts models are more robust than local statistics to these effects, which may explain their success [1]. Dedicated corrections can further increase this robustness [2]. Moreover, phylogenetic correlations can in fact provide useful information for some inference tasks, especially to infer interaction partners from sequences among the paralogs of two protein families. In this case, signal from phylogeny and signal from constraints combine constructively [3], and explicitly exploiting both further improves inference performance [4].
Protein language models have recently been applied to sequence data, greatly advancing structure, function and mutational effect prediction. Language models trained on multiple sequence alignments capture coevolution and structural contacts, but also phylogenetic relationships [5]. They are able to disentangle signal from structural constraints and from phylogeny more efficiently than Potts models [5], and they have promising generative properties [6].
–
Publication: [1] Dietler N, Lupo U, Bitbol A-F (2022) "Impact of phylogeny on structural contact inference from protein sequence data", https://arxiv.org/abs/2209.13045<br>[2] Colavin A, Atolia E, Bitbol A-F, Huang KC (2022) "Extracting phylogenetic dimensions of coevolution reveals hidden functional signals", Scientific Reports 12(1):820<br>[3] Gerardos A, Dietler N, Bitbol A-F (2022) "Correlations from structure and phylogeny combine constructively in the inference of protein partners from sequences", PLoS Computational Biology 18(5): e1010147<br>[4] Gandarilla-Perez CA, Pinilla S, Bitbol A-F, Weigt M (2022) "Combining phylogeny and coevolution improves the inference of interaction partners among paralogous proteins", https://arxiv.org/abs/2208.11626<br>[5] Lupo U, Sgarbossa D, Bitbol A-F (2022) "Protein language models trained on multiple sequence alignments learn phylogenetic relationships", https://arxiv.org/abs/2203.15465<br>[6] Sgarbossa D, Lupo U, Bitbol A-F (2022) "Generative power of a protein language model trained on multiple sequence alignments", https://arxiv.org/abs/2204.07110
Presenters
-
Anne-Florence Bitbol
EPFL, Ecole Polytechnique Federale de Lausanne
Authors
-
Anne-Florence Bitbol
EPFL, Ecole Polytechnique Federale de Lausanne