Enhancing Polymer Design with Fine-Tuned Large Language Models: Bridging Chemical and Natural Languages

Yuan Tian; Gustavo R Perez-Lemus; Pablo Zubieta; Heyi Liang; Juan J De Pablo

Enhancing Polymer Design with Fine-Tuned Large Language Models: Bridging Chemical and Natural Languages

ORAL

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing and are now being adapted to address scientific challenges in specific domains. This study expands LLM applications by fine-tuning models from the GPT series with polymer-specific datasets, enhancing their understanding of chemical structures and properties. By integrating chemical languages such as SMILES (Simplified Molecular Input Line Entry System) with natural language descriptions, we bridge the gap between chemical and natural language, establishing meaningful connections between polymer structures, nomenclature, and macroscopic properties. The fine-tuned model achieves over 95% accuracy in classifying polymer reactions, indicating its ability to effectively recognize polymer patterns based on structural and nomenclature inputs. Multi-task fine-tuning improves generalization compared to single-task approaches and shows significant accuracy gains over in-context learning methods like zero-shot and few-shot approaches. This strategy significantly boosts predictive accuracy and efficiency compared to general-purpose LLMs, paving the way for the design of novel polymer materials. Beyond polymers, this work suggests new possibilities for LLMs in scientific fields requiring the integration of formal and natural language information.

March 20, 2025, 5:06 PM – March 20, 2025, 5:18 PM

Presenters

Yuan Tian

University of Chicago, The University of Chicago

Authors

Yuan Tian

University of Chicago, The University of Chicago
Gustavo R Perez-Lemus

University of Chicago
Pablo Zubieta

University of Chicago
Heyi Liang

University of Chicago
Juan J De Pablo

University of Chicago