Searching for the Relevant Properties of Binary Datasets: Is your Model Truly Pairwise?

Clelia De Mulatier; Paolo Pietro Mazza; Matteo Marsili

Searching for the Relevant Properties of Binary Datasets: Is your Model Truly Pairwise?

ORAL

Abstract

Uncovering the patterns hidden within noisy data is essential to science. Information theory provides a quantitative method to select the best of potential explanations for data, by optimizing the balance between goodness-of-fit and simplicity. Yet in practice finding “the” best model for a given dataset is impossible. A common practical issue is the huge number of potential models. But with a finite amount of data, the real limitation comes from the large degeneracy of models that perform nearly optimally. We illustrate this problem on examples of binary data using a heuristic procedure to perform an efficient search among all spin models with high order interactions. As good models tend to share a common sub-structure that is likely to capture relevant properties of the data, we focus our search on this structure rather than on finding the strictly best model. We show that minimally complex spin models are useful for this task. We obtain an analytic expression for their posterior probability, which makes them easy to fit and exactly comparable. We then show that working with equivalence classes of these models allows a) to find the spin basis in which the dependencies between basis variables are minimal and b) to quantify these dependencies and the relevance of each dimension.

March 5, 2020, 11:36 AM – March 5, 2020, 11:48 AM

Presenters

Clelia De Mulatier

University of Pennsylvania

Authors

Clelia De Mulatier

University of Pennsylvania
Paolo Pietro Mazza

Institute for Theoretical Physics, University of Tübingen
Matteo Marsili

International Centre for Theoretical Physics