Identifying Sources of Construct-Irrelevant Variance in Performance Testing

 

Practice Area Division(s): Certification/Licensure

Topic: Testing, Measurement, and Psychometrics

Session Type: Breakout

Performance testing occupies a prominent role in licensing and certification, with over one-third of credentialing agencies using methods such as oral exams, work samples, and computer-based simulations. The interest in performance testing stems from a desire to have authentic assessments that more directly portray the tasks encountered in the real world. While it is evident that performance tests can make a positive contribution to score validity, Messick (1994) cautions that the perceived directness of performance tests does not forgive the need to investigate potential sources of construct-irrelevant variance (CIV).

In this presentation, the presenter will first identify several possible sources of CIV in performance tests, and then summarize a sampling of studies that document the presence of CIV (e.g., examinee likeability; ambiguity in scoring keys). Much of this session will focus on practice effects – or the tendency for scores to increase throughout the testing day and as the examinee gains familiarity with the testing format. Several studies published in the past several years indicate that such effects are present on both computer-based simulations and simulated work samples in medicine. For example, studies have found that:

(a) performance improves throughout the testing day due to a notable practice effect, and this effect is larger for examinees with less experience with performance testing formats;

(b) score gains for repeaters on performance tests are about twice the magnitude as score gains on multiple-choice tests; meanwhile, there is no advantage for examinees who see a few of the same performance tasks on two occasions; and

(c) repeaters’ scores on the second attempt exhibit better internal consistency, a more meaningful factor structure, and correlate more highly with external criteria than scores on their first attempt.

Collectively, the results of these and other studies suggest that performance tests, while arguably providing a more authentic assessment, also are influenced by CIV in non-ignorable ways. After discussing some of the specific ways that CIV impacts score interpretation, the presenter will suggest possible strategies for mitigating its effects.

Presenter: Mark Raymond, National Board of Medical Examiners (NBME)