APS Logo

Methods for trustworthy application of Large Language Models in PER

ORAL

Abstract

Within physics education research (PER), a growing body of literature investigates using natural language processing machine learning algorithms to apply coding schemes to student writing. The aspiration is that this form of measurement may be more efficient and consistent than similar measurements made with human analysis, allowing larger and broader data sets to be analyzed. In our work, we are harnessing recent innovations in Large Language Models (LLMs) such as BERT and LLaMA to learn complex coding scheme rules. Furthermore, we leverage methods from uncertainty quantification to help understand the trustworthiness of these measurements. In this talk, I will demonstrate a successful application of LLMs to measure experimental skills in lab notes and apply our methodology to evaluate the statistical and systematic uncertainty in this form of algorithm measurement.

Publication: Rebeckah K. Fussell, Emily M. Stump, and N. G. Holmes, A method to assess trustworthiness of machine coding at scale (arXiv, 2023), https://arxiv.org/abs/2310.02335v2.

Presenters

  • Rebeckah Fussell

    Cornell University

Authors

  • Rebeckah Fussell

    Cornell University

  • Megan Flynn

    Cornell University

  • Anil Damle

    Cornell University

  • Natasha G Holmes

    Cornell University