Comparing Different NLP Methods at Grading Open Ended Physics Problems

ORAL

Abstract

A limiting factor in examining student reasoning on conceptual assessments has been that such assessments are typically administered in a multiple-choice (MC) format. Thus limiting the options to analyze more elaborate data from students who may not fully understand a concept. Current tools such as Machine Learning (ML) and large language models (LLMs) offer a promising opportunity for assessing students' written responses in a fair and consistent way, offering a possible alternative to MC assessments.. Our study compares ML, LLM and humans at classifying students' written explanations (correct or incorrect) with respect to their correctness on MC questions. We compare the correctness of the written explanations with the ground truth (students' MC correctness), allowing us to verify the accuracy in Natural Language processing (NLP) techniques in grading written responses. Our preliminary findings show that MC results best align with human raters, though ML and LLM are not far behind.

Presenters

  • Sean Savage

    Purdue University, Purdue University, West Lafayette

Authors

  • Sean Savage

    Purdue University, Purdue University, West Lafayette

  • Nikhil Borse

    Purdue University, Purdue University - West Lafayette

  • N. Sanjay Rebello

    Purdue University, Purdue University - West Lafayette