Methods for trustworthy application of Large Language Models in PER

Rebeckah Fussell; Megan Flynn; Anil Damle; Natasha G Holmes

Methods for trustworthy application of Large Language Models in PER

ORAL

Abstract

Within physics education research (PER), a growing body of literature investigates using natural language processing machine learning algorithms to apply coding schemes to student writing. The aspiration is that this form of measurement may be more efficient and consistent than similar measurements made with human analysis, allowing larger and broader data sets to be analyzed. In our work, we are harnessing recent innovations in Large Language Models (LLMs) such as BERT and LLaMA to learn complex coding scheme rules. Furthermore, we leverage methods from uncertainty quantification to help understand the trustworthiness of these measurements. In this talk, I will demonstrate a successful application of LLMs to measure experimental skills in lab notes and apply our methodology to evaluate the statistical and systematic uncertainty in this form of algorithm measurement.

^* This work is supported by NSF grants #2000739 and #1808945.

April 3, 2024, 2:45 PM – April 3, 2024, 2:57 PM

Publication: Rebeckah K. Fussell, Emily M. Stump, and N. G. Holmes, A method to assess trustworthiness of machine coding at scale (arXiv, 2023), https://arxiv.org/abs/2310.02335v2.

Presenters

Rebeckah Fussell

Cornell University

Authors

Rebeckah Fussell

Cornell University
Megan Flynn

Cornell University
Anil Damle

Cornell University
Natasha G Holmes

Cornell University