Evaluating Large Language Models for Small-Scale Data Analysis in Physics Education

POSTER

Abstract



This study investigates the potential of large language models (LLMs) to assist in the analysis of small-scale data sets, using previously collected survey responses compiled in an Excel spreadsheet. In this analysis, we compare the performance of several prominent AI platforms (e.g., ChatGPT, Grok, Copilot, Gemini, etc.) in a two-phase evaluation. In Phase 1, each AI was prompted with standardized, broadly applicable research questions (e.g., “What trends are present in this data set?” or “How might this data inform future studies?”) without further clarification or correction. This phase is designed to assess each model’s baseline utility and reliability when applied directly to social science data. In Phase 2, the same AIs were re-engaged with the same questions, but with limited human assistance—such as correcting factual or analytical errors while maintaining the original inquiry style. This comparative approach helps identify whether and how human intervention improves AI performance in qualitative and semi-quantitative data interpretation. By highlighting the capabilities and limitations of current LLMs in this context, this study aims to inform more effective, scalable approaches to analyzing student feedback and instructional outcomes—ultimately supporting evidence-based improvements in physics teaching, especially in settings where only limited data is available.

Presenters

  • Matthew Rundquist

    Brigham Young University

Authors

  • Matthew Rundquist

    Brigham Young University

  • James C Hecht

    Brigham Young University

  • Seth Read

    Brigham Young University

  • Andrew J Mason

    University of Central Arkansas

  • John S Colton

    Brigham Young University