European Survey Research AssociationEuropean Survey Research Association
 
Home About us Membership Conferences Journal Courses Minutes Contact

Login to your account:

Sign up | Reset password

Conferences

Conferences


ESRA2009: Conference main page | Overview of sessions | Time table

Warsaw 2009: Presentations and short courses


Impact of test score scaling model on regression and multilevel estimates

Session: IRT: Item Response Theory in Survey Methodology (II)

Author:

  • Maciej Jakubowski; University of Warsaw, Poland

Abstract:

This paper analyzes how student achievement test score scaling affects final estimates obtained from the regression analysis. While general differences between test score scaling models are well established, little is known about their impact on the analysis of student achievement using linear regression and multilevel models (see Brown et al., 2006, for seminal paper discussing impact of IRT models on basic statistics). The paper explores data from TIMSS 1995 and PISA 2000 international surveys of student achievement. TIMSS and PISA data are analyzed separately not to confound the effect of study design with the impact of scaling methodology. For both surveys scores from two distinct scaling methodologies are available. In TIMSS 1995 student test scores were scaled using the one-parameter and the three-parameter IRT models. In PISA datasets the weighted likelihood estimates and plausible values were made available. While survey organizers advice data users on which model should be preferred and what are the drawbacks of another option, scores from biased methods are widely used in other studies. Thus, our research could be helpful to assess potential risks of using such scores in a secondary analysis. We tested three commonly employed regression models. The linear regression model, the multilevel model, and the quantile regression model. The models were estimated with a set of variables typically used in educational studies which attempts to explain student achievement. Individual and school level regressors were included. The results were presented for all countries and compared for differently scaled scores. Point estimates with their standard errors as well as variance decomposition were discussed. The results suggest that linear and multilevel regression point estimates obtained in a typical analysis are not importantly affected by the choice of scaling method. However, standard errors as well as variance estimates could be heavily biased notably affecting final conclusions. Results of analysis with quantile regression or based on smaller number of observations (e.g. for subgroups) could be also highly affected by a scaling model. The paper points out that researchers should fully understand what are the drawbacks and benefits from a scaling method used to produce scores they wish to analyze. From the other side, survey organizers should clearly describe scaling process and methodology discussing potential risks of using scaled test scores for more demanding examination.

Attachment: