APS Logo

Towards a theory of deep learning for hierarchical and compositional data

ORAL · Invited

Abstract

The theoretical understanding of deep learning methods requires us to consider the structure of the data that these methods are successful at learning. For instance, image- and text-like data display a hierarchical and compositional structure that deep learning methods can capture due to their layered architecture.

In this talk, I will present a strategy to model this structure based on probabilistic context-free grammars---tree-like generative models of text from theoretical linguistics. I will then describe how deep learning can leverage the hierarchical structure for learning with higher data efficiency than simpler, shallow machine-learning models. This analysis unveils a fundamental relationship between the data correlations, the latent hierarchical structure of the data and the size of the training set, leading to a prediction of learning curves that can be tested in experiments with deep convolutional networks and transformers.

Finally, I will present empirical evidence demonstrating that the relationship between training set size and correlations extends beyond our synthetic datasets. In the context of self-supervised learning, this relationship predicts a scaling form for the behaviour of learning curves as a function of the length of the input sequences.

Publication: Towards a theory of how the structure of language is acquired by deep neural networks (https://arxiv.org/abs/2406.00048)

Presenters

  • Francesco Cagnetta

    EPFL, Scuola Internazionale Superiore di Studi Avanzati (SISSA)

Authors

  • Francesco Cagnetta

    EPFL, Scuola Internazionale Superiore di Studi Avanzati (SISSA)