Graph and Generative Large Language Models for Data-Driven Materials Discovery
ORAL
Abstract
Existing materials datasets, such as the Materials Project and Automatic FLOW for Materials Discovery (AFLOW), are invaluable for leveraging machine learning (ML) to accelerate the discovery of new materials. However, the diverse compound compositions and crystal structures across these datasets present a significant challenge, requiring ML models with robust generalization capabilities. Furthermore, the limited instances of compounds within specific categories of interest impede the training of accurate models tailored for targeted applications.
To address the challenge of generalization across diverse materials, we propose a graph convolution algorithm, Chemical Environment Adaptive Learning with Learnable Weighting Functions (CEAL-WF). CEAL-WF employs a set of aggregators to capture key atomic interactions within local chemical environments. Its dynamic learnable weighting functions adjust the influence of aggregated messages from neighboring atoms, improving model performance across a broad range of materials.
To overcome the limitation of data scarcity in specific material categories, we propose a generative large language model (LLM) trained on materials compositions and structures from existing datasets. This model generates a search space of plausible hypothetical compound structures, broadening the pool of candidate materials. In doing so, it increases the likelihood of discovering stable compounds with desired properties, while reducing the computational cost.
To address the challenge of generalization across diverse materials, we propose a graph convolution algorithm, Chemical Environment Adaptive Learning with Learnable Weighting Functions (CEAL-WF). CEAL-WF employs a set of aggregators to capture key atomic interactions within local chemical environments. Its dynamic learnable weighting functions adjust the influence of aggregated messages from neighboring atoms, improving model performance across a broad range of materials.
To overcome the limitation of data scarcity in specific material categories, we propose a generative large language model (LLM) trained on materials compositions and structures from existing datasets. This model generates a search space of plausible hypothetical compound structures, broadening the pool of candidate materials. In doing so, it increases the likelihood of discovering stable compounds with desired properties, while reducing the computational cost.
–
Presenters
-
Yong Wei
University of North Georgia
Authors
-
Yong Wei
University of North Georgia
-
Mingyuan Yan
University of North Georgia
-
Yuewei Lin
Brookhaven National Laboratory
-
Hanning Chen
University of Texas at Austin