Graph and Generative Large Language Models for Data-Driven Materials Discovery

Yong Wei; Mingyuan Yan; Yuewei Lin; Hanning Chen

Graph and Generative Large Language Models for Data-Driven Materials Discovery

ORAL

Abstract

Existing materials datasets, such as the Materials Project and Automatic FLOW for Materials Discovery (AFLOW), are invaluable for leveraging machine learning (ML) to accelerate the discovery of new materials. However, the diverse compound compositions and crystal structures across these datasets present a significant challenge, requiring ML models with robust generalization capabilities. Furthermore, the limited instances of compounds within specific categories of interest impede the training of accurate models tailored for targeted applications.

To address the challenge of generalization across diverse materials, we propose a graph convolution algorithm, Chemical Environment Adaptive Learning with Learnable Weighting Functions (CEAL-WF). CEAL-WF employs a set of aggregators to capture key atomic interactions within local chemical environments. Its dynamic learnable weighting functions adjust the influence of aggregated messages from neighboring atoms, improving model performance across a broad range of materials.

To overcome the limitation of data scarcity in specific material categories, we propose a generative large language model (LLM) trained on materials compositions and structures from existing datasets. This model generates a search space of plausible hypothetical compound structures, broadening the pool of candidate materials. In doing so, it increases the likelihood of discovering stable compounds with desired properties, while reducing the computational cost.

March 18, 2025, 3:42 PM – March 18, 2025, 3:54 PM

Presenters

Yong Wei

University of North Georgia

Authors

Yong Wei

University of North Georgia
Mingyuan Yan

University of North Georgia
Yuewei Lin

Brookhaven National Laboratory
Hanning Chen

University of Texas at Austin