APS Logo

Contrastive Learning Reveals the Trajectory of Protein Structure Evolution

ORAL

Abstract

The molecular structure of a protein in three-dimensional space can be represented by the spatial distances of all possible amino acid residue pairs, formulating a symmetric matrix, called contact map. Two categories of protein structure evolution data are investigated in this work: sequences of contact maps of (1) lysozyme adsorption on a graphene surface obtained by discontinuous molecular dynamics (DMD) simulations, and (2) human cell receptor ACE2 binding with the wild-type SARS-CoV-2 spike protein and with key mutants via large-scale all-atom explicit solvent molecular dynamics simulations. The contrastive learning machine learning model learns the feature representations of contact maps by maximizing the agreement between a positive pair (xi,xj) via a loss function, in which xi and xj are correlated views of the same contact map x, generated by stochastic data augmentations τ~Τ and τ’~Τ, respectively. The extracted contact map feature representations are then grouped into stages using k-means clustering to reveal stages of protein structure evolution trajectories. Experimental results show that these protein structure evolution stages obtained by the contrastive learning models are invaluable to studying the protein folding path in the adsorption processes and understanding the allosteric regulation mechanism of SARS-CoV-2 spike protein in the receptor-binding domain (RBD)-ACE2 binding processes.

Presenters

  • Yong Wei

    High Point University

Authors

  • Yong Wei

    High Point University

  • Baofu Qiao

    City University of New York

  • Tao Wei

    Howard University

  • Hanning Chen

    The University of Texas at Austin