APS Logo

High fitness paths can connect proteins with low sequence overlap

ORAL

Abstract

We present a computational scheme to generate viable paths between homologous protein pairs through stepwise single residue mutations. These paths are composed of intermediate sequences with high fitness as predicted by the protein language model ESM2. To do this we need the means to generate sensible mutations for a given sequence, a computational proxy for fitness, a distance measure between protein pairs, along with a search strategy to navigate through the space. We use the One Fell Swoop (OFS) approach to calculate the mutation profiles of the intermediates, and use them as the proposal distribution to sample candidate mutants. The fitness of the proposals is determined by their OFS pseudo-perplexity, while the proximity between two states is defined by their sequence alignment score as calculated through their ESM2 sequence embeddings. To navigate towards the target, we choose mutants with a high predicted fitness that are closest to the target state over iterative steps. We use this scheme to interpolate between progressively divergent protein pairs, some of which do not even acquire the same structural fold, and document the qualitative variation across the generated paths. The ease of interpolating between two sequences, as quantified by some cost function that depends on the path length and the functional plausibility of the intermediates, could potentially be used as a proxy for the likelihood of homology between them.

Presenters

  • Pranav Kantroo

    Yale University

Authors

  • Pranav Kantroo

    Yale University

  • Gunter P Wagner

    Yale University

  • Ben Machta

    Yale University