Weighted and Unweighted Correlation Methods for Large-Scale Educational Assessment: wCorr Formulas

Ahmad Emad
,
Qingshu Xie, MacroSys
,
Emmanuel Sikali, National Center for Education Statistics

Correlation analysis has been used widely by researchers and analysts when analyzing large-scale assessment data. Limit research provided reliable methods to estimate various correlations and their standard errors with the complex sampling design and multiple plausible values taken into account. This report introduces the methodology used by the wCorr R package (Emad & Bailey, 2017) for computing the Pearson, Spearman, polyserial, and polychoric correlations, with and without weights applied. The methodology treats tetrachoric correlation as a specific case of the polychoric correlation and biserial correlation as a specific case of the polyserial correlation.

This paper is part of a series of AIR-NAEP working papers that showcase AIR’s expertise and experience not only with NAEP but with other large-scale assessments and survey-based longitudinal studies. Explore all the AIR-NAEP working papers.

Simulation evidence is presented to show correctness of the methods, including an examination of the bias and consistency. Overall, the simulations show first-order convergence for each unweighted correlation coefficient with an approximately linear computation cost. Further, under our simulation assumptions, the weighted correlation performs better than the unweighted correlation for all correlation coefficients.

We show the first-order convergence of the weighted Pearson, polyserial, and polychoric correlation coefficient. The Spearman is shown to not consistently estimate the population Pearson correlation coefficient but is shown to consistently estimate the population Spearman correlation coefficient—under the assumptions of our simulation.