Scalable and interpretable machine learning for inference in stochastic transcriptional systems
ORAL
Abstract
Advances in experimental techniques have enabled the simultaneous quantification of multiple molecular readouts, such as the genomes, transcriptomes, and proteomes in millions of individual cells. Integrating these high-dimensional, multimodal data is a key outstanding biological problem, requiring mechanistic models for biological interpretability.
We use physics-informed machine learning to develop a scalable and general approach for high-throughput inference of transcriptional system kinetics. Multimodal data can be integrated by defining models of transcriptional dynamics that parameterize multi-species biophysical processes. For example, the mammalian RNA life cycle includes transcription, splicing, and degradation of individual molecules. Such discrete systems can be modeled using chemical master equations, but do not afford analytical solutions. We use a neural network to approximate steady-state distributions for this model, and employ this differentiable approximation in a variational autoencoder to fit simulated and experimental data with unspliced and spliced RNA counts. The approximation and inference techniques can be extended to a variety of discrete physical systems, presenting opportunities for high-dimensional, mechanistic analyses beyond biology.
–
Publication: M. T. Carilli, G. Gorin, T. Chari, L. Pachter. (2022). Bioarxiv.
Presenters
-
Maria Carilli
Caltech
Authors
-
Maria Carilli
Caltech