Machine Learning-Enhanced Predictions of Transcription Factor-DNA Binding Stability in Genomic Contexts
ORAL
Abstract
Transcription factors (TFs) from the bHLHLZ family, such as Myc-Max, Omomyc, and Max-Max, regulate gene expression by interacting with specific DNA sequences to control processes like cell growth and differentiation. Our previous all-atom molecular dynamics (MD) simulations revealed significant differences in the stability of these TFs when bound to the Ebox (CACGTG) versus nonspecific polyA DNA sequences. Traditional methods like MMGBSA capture some interaction aspects but fail to account for nonlinear energy dependencies, limiting their accuracy in predicting binding free energy.
To address this, we developed a machine learning (ML) model trained on MMGBSA-derived energy terms such as electrostatics, van der Waals, solvation, entropy, and hydrogen bonding energies to improve free energy predictions. Our artificial neural network (ANN) was designed to predict TF-DNA binding outcomes using genomic context protein binding microarray (gcPBM) data. Unlike traditional models that focus on single point mutations, our ANN captures both dynamic and energetic landscapes while evaluating multiple mutations simultaneously.
By learning nonlinear relationships between energy terms, our ANN significantly outperformed additive methods, providing more accurate predictions of TF-DNA stability. The model also offers rapid, scalable free energy estimates, making it ideal for large-scale genomic studies, with broader applications in understanding protein-DNA binding dynamics.
To address this, we developed a machine learning (ML) model trained on MMGBSA-derived energy terms such as electrostatics, van der Waals, solvation, entropy, and hydrogen bonding energies to improve free energy predictions. Our artificial neural network (ANN) was designed to predict TF-DNA binding outcomes using genomic context protein binding microarray (gcPBM) data. Unlike traditional models that focus on single point mutations, our ANN captures both dynamic and energetic landscapes while evaluating multiple mutations simultaneously.
By learning nonlinear relationships between energy terms, our ANN significantly outperformed additive methods, providing more accurate predictions of TF-DNA stability. The model also offers rapid, scalable free energy estimates, making it ideal for large-scale genomic studies, with broader applications in understanding protein-DNA binding dynamics.
–
Publication: Machine Learning-Enhanced Predictions of Transcription Factor-DNA Binding Stability in Genomic Contexts (planned paper)
Presenters
-
Carmen Al Masri
University of California, Irvine
Authors
-
Carmen Al Masri
University of California, Irvine
-
Jin Yu
University of California, Irvine