Handling Ignorable and Non-ignorable Missing Data through Bayesian Methods in JAGS

Authors

  • Ziqian Xu University of Notre Dame Author

DOI:

https://doi.org/10.35566/jbds/v2n2/xu

Keywords:

Missing Data, Bayesian Analysis, Structural Equation Modeling

Abstract

With the prevalence of missing data in social science research, it is necessary to use methods for handling missing data. One framework in which data with missing values can still be used for parameter estimation is the Bayesian framework. In this tutorial, different missing data mechanisms including Missing Completely at Random, Missing at Random, and Missing Not at Random are introduced. Methods for estimating models with missing values under the Bayesian framework for both ignorable and non-ignorable missingness are also discussed. A structural equation model on data from the Advanced Cognitive Training for Independent and Vital Elderly study is used as an illustration on how to fit missing data models in JAGS.

References

Albert, P. S., & Follmann, D. A. (2008). Shared-parameter models. In Longitudinal data analysis (pp. 447–466). Chapman and Hall/CRC. doi: https://doi.org/10.1201/9781420011579.ch19

Berchtold, A. (2019). Treatment and reporting of item-level missing data in social science research. International Journal of Social Research Methodology, 22(5), 431–439. doi: https://doi.org/10.1080/13645579.2018.1563978

Berger, J. O., & Strawderman, W. E. (1996). Choice of hierarchical priors: Admissibility in estimation of normal means. The Annals of Statistics, 931–951. doi: https://doi.org/10.1214/aos/1032526950

Bürkner, P.-C. (2017). Advanced bayesian multilevel modeling with the r package brms. arXiv preprint arXiv:1705.11123. doi: https://doi.org/10.32614/rj-2018-017

Denwood, M. J. (2016). runjags: An r package providing interface utilities, model templates, parallel computing methods and additional distributions for mcmc models in jags. Journal of statistical software, 71, 1–25. doi: https://doi.org/10.18637/jss.v071.i09

Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge university press. doi: https://doi.org/10.1017/cbo9780511790942

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical science, 457–472. doi: https://doi.org/10.1214/ss/1177011136

Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica: Journal of the econometric society, 153–161. doi: https://doi.org/10.2307/1912352

Hoff, P. D., Raftery, A. E., & Handcock, M. S. (2002). Latent space approaches to social network analysis. Journal of the american Statistical association, 97(460), 1090–1098. doi: https://doi.org/10.1198/016214502388618906

Ibrahim, J. G., & Chen, M.-H. (2000). Power prior distributions for regression models. Statistical Science, 46–60. doi: https://doi.org/10.1214/ss/1009212673

Ibrahim, J. G., Chen, M.-H., & Lipsitz, S. R. (2002). Bayesian methods for generalized linear models with covariates missing at random. Canadian Journal of Statistics, 30(1), 55–78. doi: https://doi.org/10.2307/3315865

Koskinen, J. H., Robins, G. L., Wang, P., & Pattison, P. E. (2013). Bayesian analysis for partially observed network data, missing ties, attributes and actors. Social networks, 35(4), 514–527. doi: https://doi.org/10.1016/j.socnet.2013.07.003

Lee, S.-Y., & Tang, N.-S. (2006). Analysis of nonlinear structural equation models with nonignorable missing covariates and ordered categorical data. Statistica Sinica, 1117–1141.

Little, R. J. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American statistical Association, 83(404), 1198–1202. doi: https://doi.org/10.1080/01621459.1988.10478722

Little, R. J. (1994). A class of pattern-mixture models for normal incomplete data. Biometrika, 81(3), 471–483. doi: https://doi.org/10.1093/biomet/81.3.471

Ma, Z., & Chen, G. (2018). Bayesian methods for dealing with missing data problems. Journal of the Korean Statistical Society, 47(3), 297–313. doi: https://doi.org/10.1016/j.jkss.2018.03.002

Neal, R. M. (1992). Bayesian mixture modeling. In Maximum entropy and bayesian methods (pp. 197–211). Springer. doi: https://doi.org/10.1007/978-94-017-2219-3_14

Plummer, M., Best, N., Cowles, K., & Vines, K. (2006). Coda: convergence diagnosis and output analysis for mcmc. R news, 6(1), 7–11.

Plummer, M., et al. (2003). Jags: A program for analysis of bayesian graphical models using gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing (Vol. 124, pp. 1–10).

Plummer, M., Stukalov, A., & Denwood, M. (2022). rjags: Bayesian graphical models using mcmc [Computer software manual].

R Core Team. (2021). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/

Rasmussen, C. (1999). The infinite gaussian mixture model. Advances in neural information processing systems, 12.

Robins, G., Pattison, P., Kalish, Y., & Lusher, D. (2007). An introduction to exponential random graph (p*) models for social networks. Social networks, 29(2), 173–191. doi: https://doi.org/10.1016/j.socnet.2006.08.002

RStudio Team. (2022). Rstudio: Integrated development environment for r [Computer software manual]. Boston, MA. Retrieved from http://www.rstudio.com/

Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592. doi: https://doi.org/10.1093/biomet/63.3.581

Tennstedt, S., Morris, J., Unverzagt, F., Rebok, G. W., Willis, S. L., Ball, K., & Marsiske, M. (2010, June 30). Advanced cognitive training for independent and vital elderly (active), united states, 1999–2001. Inter-university Consortium for Political and Social Research [distributor]. Retrieved from https://doi.org/10.3886/ICPSR04248.v3 doi: https://doi.org/10.3886/ICPSR04248.v3

Van Buuren, S. (2018). Flexible imputation of missing data. CRC press.

Zhang, Z., & Wang, L. (2012). A note on the robustness of a full bayesian method for nonignorable missing data analysis. Brazilian Journal of Probability and Statistics, 26(3), 244–264. doi: https://doi.org/10.1214/10-bjps132

Zhang, Z., & Wang, L. (2013). Methods for mediation analysis with missing data. Psychometrika, 78(1), 154–184. doi: https://doi.org/10.1007/s11336-012-9301-5

Downloads

Additional Files

Published

2022-12-13

How to Cite

Xu, Z. (2022). Handling Ignorable and Non-ignorable Missing Data through Bayesian Methods in JAGS. Journal of Behavioral Data Science, 2(2), 99-126. https://doi.org/10.35566/jbds/v2n2/xu