Zero-Frequency Cell Correction Strategies in Tetrachoric Correlation Estimation: Expanded Strategies and Multivariate Implications

Authors

DOI:

https://doi.org/10.35566/jbds/choiwu

Keywords:

Tetrachoric Correlation, Zero-frequency Cells, Binary Data, Tetrachoric Correlation Matrix, Confirmatory Factor Analysis

Abstract

Zero-frequency cells pose a challenge for tetrachoric correlation estimation, but investigation of correction strategies remains limited. This study evaluates several zero-cell correction strategies, including different values to add, different ways to add the value, and the use of unadjusted versus adjusted thresholds in the second stage in the two-stage procedure. These strategies are examined across different correlation sizes and thresholds, to estimate a single tetrachoric correlation and extended to multivariate applications involving a tetrachoric correlation matrix and a confirmatory factor analysis model for binary data. Using multiple evaluation criteria, we show how these strategies perform differently across correlation sizes and the pattern of thresholds. This study also introduces ways to improve computational efficiency for tetrachoric correlation simulation studies that leverage the discrete structure to reduce redundant computations.

References

Brown, M. B., & Benedetti, J. K. (1977). On the mean and variance of the tetrachoric correlation coefficient. Psychometrika, 42(3), 347–355. doi: https://doi.org/10.1007/bf02293655 DOI: https://doi.org/10.1007/BF02293655

Choi, J., & Wu, H. (2025). On zero-count correction strategies in tetrachoric correlation estimation (abstract). Multivariate Behavioral Research, 60(1), 3–4. doi: https://doi.org/10.1080/00273171.2024.2442249 DOI: https://doi.org/10.1080/00273171.2024.2442249

Deng, L., Yang, M., & Marcoulides, K. M. (2018). Structural equation modeling with many variables: A systematic review of issues and developments. Frontiers in Psychology, 9, 580. doi: https://doi.org/10.3389/fpsyg.2018.00580 DOI: https://doi.org/10.3389/fpsyg.2018.00580

DiStefano, C., Shi, D., & Morgan, G. B. (2021). Collapsing categories is often more advantageous than modeling sparse data: Investigations in the cfa framework. Structural Equation Modeling: A Multidisciplinary Journal, 28(2), 237–249. doi: https://doi.org/10.1080/10705511.2020.1803073 DOI: https://doi.org/10.1080/10705511.2020.1803073

Fox, J. (2022). polycor: Polychoric and polyserial correlations. Retrieved from https://CRAN.R-project.org/package=polycor (R package version 0.8-1)

Golino, H., & Christensen, A. P. (2025). Eganet: Exploratory graph analysis – a framework for estimating the number of dimensions in multivariate data using network psychometrics. Retrieved from https://r-ega.net (R package version 2.0.3)

Muthen, L. K., & Muthen, B. O. (2017). Mplus. (Version 8)

Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44(4), 443–460. doi: https://doi.org/10.1007/bf02296207 DOI: https://doi.org/10.1007/BF02296207

R Core Team. (2022). R: A language and environment for statistical computing [Computer software manual]. Retrieved from https://www.R-project.org (Version 4.2.1)

Revelle, W. (2023). psych: Procedures for psychological, psychometric, and personality research. Retrieved from https://CRAN.R-project.org/package=psych (R package version 2.3.9)

Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. doi: https://doi.org/10.18637/jss.v048.i02 DOI: https://doi.org/10.18637/jss.v048.i02

Savalei, V. (2011). What to do about zero frequency cells when estimating polychoric correlations. Structural Equation Modeling: A Multidisciplinary Journal, 18(2), 253–273. doi: https://doi.org/10.1080/10705511.2011.557339 DOI: https://doi.org/10.1080/10705511.2011.557339

Yang, T.-R., & Weng, L.-J. (2024). Revisiting Savalei’s (2011) research on remediating zero-frequency cells in estimating polychoric correlations: A data distribution perspective. Structural Equation Modeling: A Multidisciplinary Journal, 31(1), 81–96. doi: https://doi.org/10.1080/10705511.2023.2220919 DOI: https://doi.org/10.1080/10705511.2023.2220919

Yates, F. (1934). Contingency tables involving small numbers and the

test. Supplement to the Journal of the Royal Statistical Society, 1(2), 217–235. doi: https://doi.org/10.2307/2983604 DOI: https://doi.org/10.2307/2983604

Yuan, K.-H., Wu, R., & Bentler, P. M. (2011). Ridge structural equation modeling with correlation matrices for ordinal and continuous data. The British Journal of Mathematical and Statistical Psychology, 64(1), 107-133. doi: https://doi.org/10.1348/000711010x497442 DOI: https://doi.org/10.1348/000711010X497442

Zhang, G., Trichtinger, L. A., Lee, D., & Jiang, G. (2022). PolychoricRM: A computationally efficient R function for estimating polychoric correlations and their asymptotic covariance matrix. Structural Equation Modeling: A Multidisciplinary Journal, 29(2), 310–320. doi: https://doi.org/10.1080/10705511.2021.1929996 DOI: https://doi.org/10.1080/10705511.2021.1929996

Downloads

Published

2026-02-20

Issue

Section

Theory and Methods

How to Cite

Choi, J., & Wu, H. (2026). Zero-Frequency Cell Correction Strategies in Tetrachoric Correlation Estimation: Expanded Strategies and Multivariate Implications. Journal of Behavioral Data Science, 6(1), 1-40. https://doi.org/10.35566/jbds/choiwu

Most read articles by the same author(s)