APS Logo

Applying Large Language Models to the Teaching and Assessment of Introductory Level Physics: the good, the bad and the unexpected

ORAL · Invited

Abstract

The rapid development of Generative AI, in particular large language models, has the potential to transform almost every aspects of STEM education, ranging from content creation, student interaction to assessment and evaluation. In this talk I will share several latest attempts at using large language models to improving teaching and learning in the context of introductory level Newtonian Mechanics. First, by providing LLMs with examples and knowledge from physics education research, they can be used to write constructive and personalized feedback on student's written response for conceptual questions. Our research shows that when students cannot reliably distinguish AI generated feedback from human generated feedback, and often raten AI feedback as being more useful. Second, the latest LLM GPT-4o can grade students' written explanation of problem solving process according to a multi-item rubric, with accuracy comparable to human raters at a fraction of the cost. This is achieved by adding explanation text to each rubric item that specifically addresses the weakness of LLM grading. In addition, LLM graders can also suggest potentially problematic grading cases for human review, and write personlized grading feedback to increase grading transparency for students. Lastly, while the latest LLMs can easily and accurately solve most standard introductory level physics problems, I will show that they can still be "tricked" into commiting reasoning errors that may seem obvious to humans in unexpected ways. Attempts in tricking LLMs not only shed light into the limitations of LLM reasoning, but also have directly applications in teaching and learning, such as creating AI-proof problems that cannot be direclty solved by an LLM. It also suggests the possibility of developing novel forms of assessment items based on GenAI.

Publication: Chen, Z., & Wan, T. (2024). Achieving Human Level Partial Credit Grading of Written Responses to Physics Conceptual Question using GPT-3.5 with Only Prompt Engineering. 2024 Physics Education Research Conference Proceedings, 97–101. https://doi.org/10.1119/perc.2024.pr.Chen<br><br>Wan, T., & Chen, Z. (2024). Exploring generative AI assisted feedback writing for students' written responses to a physics conceptual question with prompt engineering and few-shot learning. Physical Review Physics Education Research, 20(1), 010152. https://doi.org/10.1103/PhysRevPhysEducRes.20.010152

Presenters

  • Zhongzhou Chen

    University of Central Florida

Authors

  • Zhongzhou Chen

    University of Central Florida

  • Tong Wan

    University of Central Florida