Handling Ignorable and Non-ignorable Missing Data through Bayesian Methods in JAGS
DOI:
https://doi.org/10.35566/jbds/v2n2/xuKeywords:
Missing Data, Bayesian Analysis, Structural Equation ModelingAbstract
With the prevalence of missing data in social science research, it is necessary to use methods for handling missing data. One framework in which data with missing values can still be used for parameter estimation is the Bayesian framework. In this tutorial, different missing data mechanisms including Missing Completely at Random, Missing at Random, and Missing Not at Random are introduced. Methods for estimating models with missing values under the Bayesian framework for both ignorable and non-ignorable missingness are also discussed. A structural equation model on data from the Advanced Cognitive Training for Independent and Vital Elderly study is used as an illustration on how to fit missing data models in JAGS.
References
Albert, P. S., & Follmann, D. A. (2008). Shared-parameter models. In Longitudinal data analysis (pp. 447–466). Chapman and Hall/CRC. doi: https://doi.org/10.1201/9781420011579.ch19
Berchtold, A. (2019). Treatment and reporting of item-level missing data in social science research. International Journal of Social Research Methodology, 22(5), 431–439. doi: https://doi.org/10.1080/13645579.2018.1563978
Berger, J. O., & Strawderman, W. E. (1996). Choice of hierarchical priors: Admissibility in estimation of normal means. The Annals of Statistics, 931–951. doi: https://doi.org/10.1214/aos/1032526950
Bürkner, P.-C. (2017). Advanced bayesian multilevel modeling with the r package brms. arXiv preprint arXiv:1705.11123. doi: https://doi.org/10.32614/rj-2018-017
Denwood, M. J. (2016). runjags: An r package providing interface utilities, model templates, parallel computing methods and additional distributions for mcmc models in jags. Journal of statistical software, 71, 1–25. doi: https://doi.org/10.18637/jss.v071.i09
Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge university press. doi: https://doi.org/10.1017/cbo9780511790942
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical science, 457–472. doi: https://doi.org/10.1214/ss/1177011136
Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica: Journal of the econometric society, 153–161. doi: https://doi.org/10.2307/1912352
Hoff, P. D., Raftery, A. E., & Handcock, M. S. (2002). Latent space approaches to social network analysis. Journal of the american Statistical association, 97(460), 1090–1098. doi: https://doi.org/10.1198/016214502388618906
Ibrahim, J. G., & Chen, M.-H. (2000). Power prior distributions for regression models. Statistical Science, 46–60. doi: https://doi.org/10.1214/ss/1009212673
Ibrahim, J. G., Chen, M.-H., & Lipsitz, S. R. (2002). Bayesian methods for generalized linear models with covariates missing at random. Canadian Journal of Statistics, 30(1), 55–78. doi: https://doi.org/10.2307/3315865
Koskinen, J. H., Robins, G. L., Wang, P., & Pattison, P. E. (2013). Bayesian analysis for partially observed network data, missing ties, attributes and actors. Social networks, 35(4), 514–527. doi: https://doi.org/10.1016/j.socnet.2013.07.003
Lee, S.-Y., & Tang, N.-S. (2006). Analysis of nonlinear structural equation models with nonignorable missing covariates and ordered categorical data. Statistica Sinica, 1117–1141.
Little, R. J. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American statistical Association, 83(404), 1198–1202. doi: https://doi.org/10.1080/01621459.1988.10478722
Little, R. J. (1994). A class of pattern-mixture models for normal incomplete data. Biometrika, 81(3), 471–483. doi: https://doi.org/10.1093/biomet/81.3.471
Ma, Z., & Chen, G. (2018). Bayesian methods for dealing with missing data problems. Journal of the Korean Statistical Society, 47(3), 297–313. doi: https://doi.org/10.1016/j.jkss.2018.03.002
Neal, R. M. (1992). Bayesian mixture modeling. In Maximum entropy and bayesian methods (pp. 197–211). Springer. doi: https://doi.org/10.1007/978-94-017-2219-3_14
Plummer, M., Best, N., Cowles, K., & Vines, K. (2006). Coda: convergence diagnosis and output analysis for mcmc. R news, 6(1), 7–11.
Plummer, M., et al. (2003). Jags: A program for analysis of bayesian graphical models using gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing (Vol. 124, pp. 1–10).
Plummer, M., Stukalov, A., & Denwood, M. (2022). rjags: Bayesian graphical models using mcmc [Computer software manual].
R Core Team. (2021). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/
Rasmussen, C. (1999). The infinite gaussian mixture model. Advances in neural information processing systems, 12.
Robins, G., Pattison, P., Kalish, Y., & Lusher, D. (2007). An introduction to exponential random graph (p*) models for social networks. Social networks, 29(2), 173–191. doi: https://doi.org/10.1016/j.socnet.2006.08.002
RStudio Team. (2022). Rstudio: Integrated development environment for r [Computer software manual]. Boston, MA. Retrieved from http://www.rstudio.com/
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592. doi: https://doi.org/10.1093/biomet/63.3.581
Tennstedt, S., Morris, J., Unverzagt, F., Rebok, G. W., Willis, S. L., Ball, K., & Marsiske, M. (2010, June 30). Advanced cognitive training for independent and vital elderly (active), united states, 1999–2001. Inter-university Consortium for Political and Social Research [distributor]. Retrieved from https://doi.org/10.3886/ICPSR04248.v3 doi: https://doi.org/10.3886/ICPSR04248.v3
Van Buuren, S. (2018). Flexible imputation of missing data. CRC press.
Zhang, Z., & Wang, L. (2012). A note on the robustness of a full bayesian method for nonignorable missing data analysis. Brazilian Journal of Probability and Statistics, 26(3), 244–264. doi: https://doi.org/10.1214/10-bjps132
Zhang, Z., & Wang, L. (2013). Methods for mediation analysis with missing data. Psychometrika, 78(1), 154–184. doi: https://doi.org/10.1007/s11336-012-9301-5
