Energy-Based Models Capture Pairwise and Higher-Order Interactions in Protein Sequence Data
ORAL
Abstract
Understanding protein structure, evolution and function requires reliable inference of interacting units in folded proteins. Here we present a unifying approach for inferring two of the most important structural units of proteins: pairwise contacts, and higher-order strongly correlated units, known as sectors. Our method is a hybrid energy-based model, combining a pairwise-energy term, as used in state-of-the-art Direct Coupling Analysis, and a Restricted Boltzmann Machine (RBM) term, meant to capture higher order interactions. We show that, when trained on data from a biologically-informed ground truth model, our algorithms can learn both the pairwise and higher-order structure and are robust to varying levels of undersampling and strength of interactions in the ground truth distribution. We carry out the analysis for 2-spin and 10-spin systems with Minimum Probability Flow and Ratio Matching algorithms, respectively. We comment on why the RBM is successful at modeling the higher-order interactions and why certain choices for hyperparameters (number of hidden units in the RBM, regularization strength) lend themselves to the model's feature detection capabilities.
–
Presenters
-
Peter Fields
University of Chicago
Authors
-
Peter Fields
University of Chicago
-
Vudtiwat Ngampruetikorn
The Graduate Center, CUNY, The Graduate Center, City University of New York
-
Stephanie E Palmer
University of Chicago
-
David J Schwab
The Graduate Center, CUNY