APS Logo

Searching for the Relevant Properties of Binary Datasets: Is your Model Truly Pairwise?

ORAL

Abstract

Uncovering the patterns hidden within noisy data is essential to science. Information theory provides a quantitative method to select the best of potential explanations for data, by optimizing the balance between goodness-of-fit and simplicity. Yet in practice finding “the” best model for a given dataset is impossible. A common practical issue is the huge number of potential models. But with a finite amount of data, the real limitation comes from the large degeneracy of models that perform nearly optimally. We illustrate this problem on examples of binary data using a heuristic procedure to perform an efficient search among all spin models with high order interactions. As good models tend to share a common sub-structure that is likely to capture relevant properties of the data, we focus our search on this structure rather than on finding the strictly best model. We show that minimally complex spin models are useful for this task. We obtain an analytic expression for their posterior probability, which makes them easy to fit and exactly comparable. We then show that working with equivalence classes of these models allows a) to find the spin basis in which the dependencies between basis variables are minimal and b) to quantify these dependencies and the relevance of each dimension.

Presenters

  • Clelia De Mulatier

    University of Pennsylvania

Authors

  • Clelia De Mulatier

    University of Pennsylvania

  • Paolo Pietro Mazza

    Institute for Theoretical Physics, University of Tübingen

  • Matteo Marsili

    International Centre for Theoretical Physics