Leveraging large language models for predictive chemistry
ORAL · Invited
Abstract
Recent advances in machine learning have transformed many fields, including chemistry and materials science, where limited datasets often necessitate specialized, domain-specific techniques. In this work, we demonstrate that large language models (LLMs) trained on massive textual corpora can be fine-tuned for diverse predictive tasks in these fields—ranging from molecular and materials property prediction to chemical reaction yield forecasting and inverse design—without relying on elaborate feature engineering [1].
Specifically, we explore proprietary (e.g., GPT-3) and open-source (GPT-J-6B, Llama-3.1-8B, Mistral-7B) LLMs, benchmarking them against traditional machine learning approaches [2]. Our findings reveal that fine-tuning these models frequently yields performance comparable to, or even surpassing, conventional techniques, particularly in low-data regimes where domain-specific methods often struggle. Moreover, the ability to invert questions for generative design underscores the adaptability of LLMs. Crucially, preparing LLM-ready fine-tuning datasets from existing chemical or materials data is straightforward, and predictive models can be established with relatively modest training sets. These results demonstrate that large language models can streamline both exploratory and predictive tasks by virtue of their vast, implicit knowledge base.
As such, systematically deploying fine-tuned LLMs promises a powerful approach to guiding experiments, simulations, and design processes, substantially reducing experimental and computational overhead. This strategy also affords broader accessibility, enabling scientists in physics, chemistry, materials science, and related fields to exploit the collective knowledge encoded in these foundation models, potentially accelerating innovation and discovery.
Specifically, we explore proprietary (e.g., GPT-3) and open-source (GPT-J-6B, Llama-3.1-8B, Mistral-7B) LLMs, benchmarking them against traditional machine learning approaches [2]. Our findings reveal that fine-tuning these models frequently yields performance comparable to, or even surpassing, conventional techniques, particularly in low-data regimes where domain-specific methods often struggle. Moreover, the ability to invert questions for generative design underscores the adaptability of LLMs. Crucially, preparing LLM-ready fine-tuning datasets from existing chemical or materials data is straightforward, and predictive models can be established with relatively modest training sets. These results demonstrate that large language models can streamline both exploratory and predictive tasks by virtue of their vast, implicit knowledge base.
As such, systematically deploying fine-tuned LLMs promises a powerful approach to guiding experiments, simulations, and design processes, substantially reducing experimental and computational overhead. This strategy also affords broader accessibility, enabling scientists in physics, chemistry, materials science, and related fields to exploit the collective knowledge encoded in these foundation models, potentially accelerating innovation and discovery.
–
Publication: [1] K. M. Jablonka, P. Schwaller, A. Ortega-Guerrero, and B. Smit, Leveraging large language models for predictive chemistry Nat Mach Intel 6, 161 (2024) http://dx.doi.org/10.1038/s42256-023-00788-1<br>[2] J. Van Herck, et al., Assessment of fine-tuned large language models for real-world chemistry and material science applications Chem. Sci. 16 (2), 670 (2025) http://dx.doi.org/10.1039/D4SC04401K
Presenters
-
Berend Smit
EPFL, Lausanne, Switzerland
Authors
-
Berend Smit
EPFL, Lausanne, Switzerland