Tuesday, March 03, 2015, 2:30 PM – 3:15 PM (PST)

IGNITE SESSIONS

Moderated By:
Ashok Sarathy, Innovations in Testing Program Chair
Rachel Schoenig, Innovations in Testing Program Vice Chair

IGNITE 4 - Reconceptualizing Score Comparability in the Era of Devices

Practice Area Division(s): Education
Topic: Testing, Measurement, and Psychometrics

A traditional requirement in testing programs is that scores derived from tests given in different formats should be comparable. For example, if a test is offered in paper-and-pencil format and also online, then a score reported for a paper-and-pencil administration should be the same as the expected score on the online version of the same test. Persons at the same level of ability, per the measured construct, are expected to obtain the same score regardless of format. In the wake of a proliferation of devices of various kinds in schools and workplaces, this conceptualization of test equivalence is both reasonable and problematic.

It is reasonable because it expresses the fundamental requirement of construct-relevance and fairness in all testing. When important decisions are made as a consequence of test scores, it is not appropriate for expected scores to depend on the device on which individuals took an exam. This is especially true in settings where devices with potentially positive effects on scores are less available to those with fewer resources – for example, students in low-income districts.

At the same time, score equivalence conceived of as format independence is problematic today because of the increasingly intimate relationship between platform and performance. Students who do much of their course work on a given device will be better equipped to do well on a test administered on that type of device than in a different format. Moreover, individual preferences and choice can play an important role in inducing a fluency differential across devices, resulting in different optimal devices for assessing each individual.

In this Ignite Session, the presenters argue that traditional notions of score equivalence across test formats have been strained by the availability and utilization of new platforms for assessment, requiring a reconceptualization of score comparability that more explicitly lays out assumptions infeasible to test in every case, but critical nonetheless. Test score and device- dependence rating data from tests administered in different formats are summarized to illustrate the difficulties with claiming comparability in the traditional sense. Implications for exam policy and communication to examinees are presented.

PRESENTER:
William Lorie, Questar Assesment, Inc.