Explaining High-order Interactions in Protein Language Models

Amirali Aghazadeh; Darin Tsui

Explaining High-order Interactions in Protein Language Models

ORAL

Abstract

Protein language models (PLMs) leverage evolutionary information to perform state-of-the-art 3D structure and zero-shot variant prediction. Yet, extracting and explaining the high-order interactions that govern model predictions remains challenging as it requires querying the entire amino acid space with an exponential number of sequences. In this talk, we will empirically analyze protein language models regarding two new notions: sparsity and ruggedness. We observe that PLM's behavior is dominated by distinct operating regions in the sparsity-ruggedness plane. We then discuss how we developed a fast algorithm to extract high-order interactions from the sparse and rugged operating regions of PLMs using a number of query sequences that grow only linearly with input dimension. Our work opens new algorithmic avenues to better understand large language models.

March 18, 2025, 11:48 AM – March 18, 2025, 12:00 PM

Publication: Tsui, Darin, and Amirali Aghazadeh. "On Recovering Higher-order Interactions from Protein Language Models." arXiv preprint arXiv:2405.06645 (2024).

Presenters

Amirali Aghazadeh

Georgia Institute of Technology

Authors

Amirali Aghazadeh

Georgia Institute of Technology
Darin Tsui

Georgia Tech