Higher-Order Sequence Statistics in Protein Families: The Potts Model and MSA Transformer Showdown
POSTER
Abstract
Potts and Ising Hamiltonian models traditionally describe the physics of magnetic materials, but recently have important applications in understanding the biophysics of protein sequences. Recent "generative" machine learning models for protein sequences, including Potts models and MSA-Transformer, build on shared statistical insights but differ in their approaches. While Potts models assume pairwise interactions between amino acids, MSA-Transformer (MSA-T) claims to capture effects induced by effective potentials beyond pairwise interactions, possibly leading to superior performance in reproducing higher-order sequence statistics. We compare these models on Kinase and RR Domain protein families and find that performance depends on phylogenetic considerations. MSA-Transformer performs well without phylogenetic corrections, but once phylogeny is accounted for, the Potts model outperforms MSA-T. Our findings suggest that MSA-Transformer implicitly corrects for phylogeny in unweighted datasets, but the physics-based Potts model better captures cooperative interactions of biophysical origin when phylogenetic relationships are considered.
Presenters
-
Kisan Khatri
Department of Physics, Temple University, Philadelphia, PA,
Authors
-
Kisan Khatri
Department of Physics, Temple University, Philadelphia, PA,
-
Ronald M Levy
Department of Physics, Department of Chemistry, Temple University, Philadelphia, PA, USA
-
Allan Haldane
Department of Physics, Temple University, Philadelphia, PA, USA