Towards a theory of deep learning for hierarchical and compositional data
ORAL · Invited
Abstract
The theoretical understanding of deep learning methods requires us to consider the structure of the data that these methods are successful at learning. For instance, image- and text-like data display a hierarchical and compositional structure that deep learning methods can capture due to their layered architecture.
In this talk, I will present a strategy to model this structure based on probabilistic context-free grammars---tree-like generative models of text from theoretical linguistics. I will then describe how deep learning can leverage the hierarchical structure for learning with higher data efficiency than simpler, shallow machine-learning models. This analysis unveils a fundamental relationship between the data correlations, the latent hierarchical structure of the data and the size of the training set, leading to a prediction of learning curves that can be tested in experiments with deep convolutional networks and transformers.
Finally, I will present empirical evidence demonstrating that the relationship between training set size and correlations extends beyond our synthetic datasets. In the context of self-supervised learning, this relationship predicts a scaling form for the behaviour of learning curves as a function of the length of the input sequences.
In this talk, I will present a strategy to model this structure based on probabilistic context-free grammars---tree-like generative models of text from theoretical linguistics. I will then describe how deep learning can leverage the hierarchical structure for learning with higher data efficiency than simpler, shallow machine-learning models. This analysis unveils a fundamental relationship between the data correlations, the latent hierarchical structure of the data and the size of the training set, leading to a prediction of learning curves that can be tested in experiments with deep convolutional networks and transformers.
Finally, I will present empirical evidence demonstrating that the relationship between training set size and correlations extends beyond our synthetic datasets. In the context of self-supervised learning, this relationship predicts a scaling form for the behaviour of learning curves as a function of the length of the input sequences.
–
Publication: Towards a theory of how the structure of language is acquired by deep neural networks (https://arxiv.org/abs/2406.00048)
Presenters
-
Francesco Cagnetta
EPFL, Scuola Internazionale Superiore di Studi Avanzati (SISSA)
Authors
-
Francesco Cagnetta
EPFL, Scuola Internazionale Superiore di Studi Avanzati (SISSA)