Comparing Different NLP Methods at Grading Open Ended Physics Problems
ORAL
Abstract
A limiting factor in examining student reasoning on conceptual assessments has been that such assessments are typically administered in a multiple-choice (MC) format. Thus limiting the options to analyze more elaborate data from students who may not fully understand a concept. Current tools such as Machine Learning (ML) and large language models (LLMs) offer a promising opportunity for assessing students' written responses in a fair and consistent way, offering a possible alternative to MC assessments.. Our study compares ML, LLM and humans at classifying students' written explanations (correct or incorrect) with respect to their correctness on MC questions. We compare the correctness of the written explanations with the ground truth (students' MC correctness), allowing us to verify the accuracy in Natural Language processing (NLP) techniques in grading written responses. Our preliminary findings show that MC results best align with human raters, though ML and LLM are not far behind.
–
Presenters
-
Sean Savage
Purdue University, Purdue University, West Lafayette
Authors
-
Sean Savage
Purdue University, Purdue University, West Lafayette
-
Nikhil Borse
Purdue University, Purdue University - West Lafayette
-
N. Sanjay Rebello
Purdue University, Purdue University - West Lafayette