Examining systematic bias in classroom exam scores

Andrew F Heckler; Siyuan Marco Chen; Jolynn Pek

Examining systematic bias in classroom exam scores

ORAL

Abstract

We investigate the extent to which systematic biases may exist in introductory physics classroom exams. We investigate both Differential Item Functioning (DIF) and biases in overall classroom exam scores with respect to sex, race and first-generation status. For DIF, we compare the strength and weaknesses of three methods, which range from a common simple method with many simplifying assumptions to a more sophisticated latent variable model. For overall exam performance, we compare sum scores, which are typically used in education, to factor scores, which account for random variation in each item and allow for differential weighting of items based on their relative difficulty. We investigated exams in two courses: algebra-based (N= 611) and calculus based (N= 736) introductory physics. Results indicate that 1-2 items (out of 30) in each of the final exams showed DIF for at least one demographic group, though the items showing DIF sometimes depended on the analysis method. Considering the sum scores vs. factor scores, the two are highly correlated, but there was still substantial spread: a given sum score could have factor scores differing by up to one standard deviation. Both kinds of scores showed significant differences in exam performance for all three demographic groups, consistent with prior research. Overall, we do see evidence of systematic bias in "in situ" classroom exams: we observed some items with DIF and there were demographic differences in exam scores analyzed via sums scores or factor scores.

March 18, 2025, 4:30 PM – March 18, 2025, 4:42 PM

Presenters

Andrew F Heckler

Ohio State University

Authors

Andrew F Heckler

Ohio State University
Siyuan Marco Chen

Ohio State University
Jolynn Pek

Ohio State University