Enhancing Polymer Design with Fine-Tuned Large Language Models: Bridging Chemical and Natural Languages
ORAL
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing and are now being adapted to address scientific challenges in specific domains. This study expands LLM applications by fine-tuning models from the GPT series with polymer-specific datasets, enhancing their understanding of chemical structures and properties. By integrating chemical languages such as SMILES (Simplified Molecular Input Line Entry System) with natural language descriptions, we bridge the gap between chemical and natural language, establishing meaningful connections between polymer structures, nomenclature, and macroscopic properties. The fine-tuned model achieves over 95% accuracy in classifying polymer reactions, indicating its ability to effectively recognize polymer patterns based on structural and nomenclature inputs. Multi-task fine-tuning improves generalization compared to single-task approaches and shows significant accuracy gains over in-context learning methods like zero-shot and few-shot approaches. This strategy significantly boosts predictive accuracy and efficiency compared to general-purpose LLMs, paving the way for the design of novel polymer materials. Beyond polymers, this work suggests new possibilities for LLMs in scientific fields requiring the integration of formal and natural language information.
–
Presenters
-
Yuan Tian
University of Chicago, The University of Chicago
Authors
-
Yuan Tian
University of Chicago, The University of Chicago
-
Gustavo R Perez-Lemus
University of Chicago
-
Pablo Zubieta
University of Chicago
-
Heyi Liang
University of Chicago
-
Juan J De Pablo
University of Chicago