Explaining High-order Interactions in Protein Language Models
ORAL
Abstract
Protein language models (PLMs) leverage evolutionary information to perform state-of-the-art 3D structure and zero-shot variant prediction. Yet, extracting and explaining the high-order interactions that govern model predictions remains challenging as it requires querying the entire amino acid space with an exponential number of sequences. In this talk, we will empirically analyze protein language models regarding two new notions: sparsity and ruggedness. We observe that PLM's behavior is dominated by distinct operating regions in the sparsity-ruggedness plane. We then discuss how we developed a fast algorithm to extract high-order interactions from the sparse and rugged operating regions of PLMs using a number of query sequences that grow only linearly with input dimension. Our work opens new algorithmic avenues to better understand large language models.
–
Publication: Tsui, Darin, and Amirali Aghazadeh. "On Recovering Higher-order Interactions from Protein Language Models." arXiv preprint arXiv:2405.06645 (2024).
Presenters
-
Amirali Aghazadeh
Georgia Institute of Technology
Authors
-
Amirali Aghazadeh
Georgia Institute of Technology
-
Darin Tsui
Georgia Tech