APS Logo

Probing the Latent Hierarchical Structure of Data via Diffusion Models

ORAL · Invited

Abstract

High-dimensional data must be highly structured to be learnable. Although the compositional and hierarchical nature of data is often put forward to explain learnability, quantitative measurements establishing these properties are scarce. Likewise, accessing the latent variables underlying such a data structure remains a challenge. In this work, we show that forward-backward experiments in diffusion-based models, where data is noised and then denoised to generate new samples, allow one to probe the latent structure of data. We predict in simple hierarchical models that the change of latent variables at different level of abstraction can be triggered by considering different noise level. As a consequence, in this process changes in data occur by correlated chunks, with a length scale that diverges at a noise level where a phase transition takes place. Remarkably, we confirm this prediction in both text and image datasets using state-of-the-art diffusion models.

Publication: Probing the Latent Hierarchical Structure of Data via Diffusion Models<br>Antonio Sclocchi, Alessandro Favero, Noam Itzhak Levi, Matthieu Wyart<br>arXiv preprint arXiv:2410.13770<br><br>A phase transition in diffusion models reveals the hierarchical nature of data<br>A Sclocchi, A Favero, M Wyart<br>arXiv preprint arXiv:2402.16991 (to appear in PNAS)

Presenters

  • Matthieu Wyart

    Johns Hopkins University

Authors

  • Matthieu Wyart

    Johns Hopkins University