\( \newcommand{\vech}{{\rm vech}} \newcommand{\ve}{{\rm vec}} %\newcommand{\E}{{ E}} \newcommand{\Var}{{\rm Var}} \newcommand{\Cov}{{\rm Cov}} \newcommand{\Acov}{{\rm Acov}} \newcommand{\diag}{{\rm diag}} \newcommand{\bone}{{\bf 1}} \newcommand{\RMSEA}{{\rm RMSEA}} \newcommand{\tr}{{\rm tr}} \newcommand{\s}{\rm s} %\newcommand{\min}{\rm min} \newcommand{\limn}{{\lim_{n\rightarrow\infty}}} \newcommand{\tho}{\theta_0} \newcommand{\htn}{\hat\theta} \newcommand{\fot}{^{\frac{1}{2}}} \newcommand{\si}{{\sum_{i=1}^n}} \newcommand{\sii}{\frac{1}{n}\si} \newcommand{\cas}{\stackrel {a.s.}{\longrightarrow}} \newcommand{\cp}{\stackrel {P}{\longrightarrow}} \newcommand{\cl}{\stackrel {\cal L}{\longrightarrow}} %\newcommand{\cid}{\stackrel {\cal L}{\to}} \newcommand{\bDelta}{{\bf \Delta}} \newcommand{\bGamma}{{\bf \Gamma}} %\newcommand{\bLambda}{{\bf \Lambda}} \newcommand{\bOmega}{{\bf \Omega}} \newcommand{\bPhi}{{\bf \Phi}} \newcommand{\bPi}{{\bf \Pi}} %\newcommand{\boldPsi}{{\bf \Psi}} %\newcommand{\bSigma}{{\bf \Sigma}} \newcommand{\bUpsilon}{{\bf \Upsilon}} \newcommand{\bXi}{{\bf \Xi}} \newcommand{\byi}{{\bf \yi}} \newcommand{\bbeta}{\mathbf{\beta}} \newcommand{\bhbn}{\hat{\mbox{\boldmath $\beta$}}} \newcommand{\bchi}{\mbox{\boldmath $\chi$}} \newcommand{\bdelta}{\mbox{\boldmath $\delta$}} \newcommand{\bepsilon}{\mbox{\boldmath $\epsilon$}} \newcommand{\betab}{\mbox{\boldmath $\eta$}} \newcommand{\bgamma}{\mathbf{\gamma}} \newcommand{\biota}{\mbox{\boldmath $\iota$}} %\newcommand{\balpha}{{\bf \alpha}} \newcommand{\balpha}{\mbox{\boldmath $\alpha$}} \newcommand{\boldeta}{\mathbf{\eta}} \newcommand{\bkappa}{\mbox{\boldmath $\tau$}} \newcommand{\bLambda}{\boldsymbol{\Lambda}} \newcommand{\blambda}{\mbox{\boldmath $\lambda$}} \newcommand{\bmu}{\mbox{\boldmath $\mu$}} \newcommand{\bnu}{\mbox{\boldmath $\nu$}} \newcommand{\bomega}{\mbox{\boldmath $\omega$}} \newcommand{\bpi}{\mbox{\boldmath $\pi$}} \newcommand{\bphi}{\mbox{\boldmath $\phi$}} \newcommand{\bpsi}{\mbox{\boldmath $\psi$}} \newcommand{\bPsi}{\mathbf{\Psi}} \newcommand{\bsigma}{\mbox{\boldmath $\sigma$}} \newcommand{\bSigma}{\mbox{\boldmath $\Sigma$}} \newcommand{\btau}{\mbox{\boldmath $\tau$}} \newcommand{\bTheta}{\mathbf{\Theta}} \newcommand{\btheta}{\mbox{\boldmath $\theta$}} \newcommand{\btho}{{\mbox{\boldmath $\btheta$}}_0} \newcommand{\bhtn}{\hat{\mbox{\boldmath $\theta$}}} \newcommand{\btdtn}{\tilde{\mbox{\boldmath $\theta$}}} \newcommand{\bupsilon}{\mbox{\boldmath $\upsilon$}} \newcommand{\bvarepsilon}{\mbox{\boldmath $\varepsilon$}} \newcommand{\bvarphi}{\mathbf{\varphi}} \newcommand{\bvartheta}{\mbox{\boldmath $\vartheta$}} \newcommand{\bxi}{\mathbf{\xi}} \newcommand{\bzeta}{\mbox{\boldmath $\zeta$}} \newcommand{\bA}{{\bf A}} \newcommand{\ba}{{\bf a}} \newcommand{\bB}{{\bf B}} \newcommand{\bb}{{\bf b}} \newcommand{\bC}{{\bf C}} \newcommand{\bc}{{\bf c}} \newcommand{\bD}{{\bf D}} \newcommand{\bE}{{\bf E}} \newcommand{\be}{{\bf e}} \newcommand{\bF}{{\bf F}} \newcommand{\bff}{{\bf f}} \newcommand{\bG}{{\bf G}} \newcommand{\bg}{{\bf g}} \newcommand{\bH}{{\bf H}} \newcommand{\bh}{{\bf h}} \newcommand{\bI}{{\bf I}} \newcommand{\bJ}{{\bf J}} \newcommand{\bK}{{\bf K}} \newcommand{\bL}{{\bf L}} \newcommand{\bl}{{\bf l}} \newcommand{\bM}{{\bf M}} \newcommand{\boldm}{{\bf m}} \newcommand{\bP}{{\bf P}} \newcommand{\bp}{{\bf p}} \newcommand{\bQ}{{\bf Q}} \newcommand{\bq}{{\bf q}} \newcommand{\bR}{{\bf R}} \newcommand{\bS}{{\bf S}} \newcommand{\bs}{{\bf s}} \newcommand{\bT}{{\bf T}} \newcommand{\bt}{{\bf t}} \newcommand{\bU}{{\bf U}} \newcommand{\boldu}{{\bf u}} \newcommand{\bV}{{\bf V}} \newcommand{\bv}{{\bf v}} \newcommand{\bW}{{\bf W}} \newcommand{\cI}{{\cal I}} \newcommand{\cX}{{\cal X}} \newcommand{\cY}{{\cal Y}} \newcommand{\bX}{{\bf X}} \newcommand{\bx}{{\bf x}} \newcommand{\bY}{{\bf Y}} \newcommand{\by}{{\bf y}} \newcommand{\bZ}{{\bf Z}} \newcommand{\bz}{{\bf z}} \) Journal of Behavioral Data Science, 2022, 2 (1), 35–58.
DOI: https://doi.org/10.35566/jbds/v2n1/p2

How to Select the Best Fit Model Among Bayesian Latent Growth Models for Complex Data

Zhenqiu (Laura) Lu*$^{1}$$^{[0000-0001-9482-1368]}$ and Zhiyong Zhang$^{2}$$^{[0000-0003-0590-2196]}$
$^1$ University of Georgia, Athens, GA 30602, USA
zlu@uga.edu
$^1$ University of Notre Dame, Notre Dame, IN 46530, USA
zhiyongzhang@nd.edu
Abstract. Bayesian approach is becoming increasingly important as it provides many advantages in dealing with complex data. However, there is no well-defined model selection criterion or index in a Bayesian context. To address the challenges, new indices are needed. The goal of this study is to propose new model selection indices and to investigate their performances in the framework of latent growth mixture models with missing data and outliers in a Bayesian context. We consider latent growth models because they are very flexible in modeling complex data and becoming increasingly popular in statistical, psychological, behavioral, and educational areas. Specifically, this study conducted five simulation studies to cover different cases, including latent growth curve models with missing data, latent growth curve models with missing data and outliers, growth mixture models with missing data and outliers, extended growth mixture models with missing data and outliers, and latent growth models with different classes. Simulation results show that almost all proposed indices can effectively identify the true model. This study also illustrated the application of these model selection indices in real data analysis.

Keywords: Model selection criterion · Bayesian estimation · Latent growth models · Missing data · Robust method

1 Introduction

Bayesian approach is becoming increasingly important in estimating models as it provides many advantages in dealing with complex data (e.g., Dunson2000). However, there is no well-defined model selection criterion or index in a Bayesian context (e.g., Celeux, Forbes, Robert, & Titterington2006). It is due to at least three problems. First, in a Bayesian context there are two versions of deviance because the Bayesian procedure generates Monte Carlo Markov chains for each parameter. One version is the posterior estimate, which can be estimated by a function of an estimate of a parameter. Another version is the Monte Carlo estimate of the expected deviance based on Bayesian iterations, which can be estimated as the posterior mean of a converged Markov chain. In short, the former is the deviance of the averaged estimates, and the latter is the average of all deviance iterations. The second problem is related to the complexity of the raw data. The data often come from heterogeneous populations which almost unavoidable contain outliers and missing values. The estimates from mis-specified models may result in severely misleading conclusions. The third problem relates to the likelihood function. When latent variables are considered in statistical models, the likelihood function can be an observed-data likelihood function, a complete-data likelihood function, or a conditional likelihood function (Celeux et al.2006). Furthermore, if data come from heterogeneous populations, the class membership indicator may have different versions, for example, a posterior mode or a posterior mean. Also, with missing data, the likelihood functions have different ways to construct.

1.1 Model Selection Criteria/Indices

Traditional model selection criteria or indices are available for researchers who try to select the best-fit model from a large set of candidate models. Akaike (1974) proposed the Akaike’s information criterion (AIC), which offers a relative measure of the information lost. For Bayesian models the Bayes factor, which is the ratio of posterior odds to prior odds, can work for both hypothesis testing and model comparison. But the Bayes factor is often difficult or impossible to calculate, especially for models that involve random effects, large numbers of unknowns or improper priors. To approximate the Bayes factor, Schwarz (1978) developed the Bayesian information criterion (BIC, sometimes called the Schwarz criterion). To obtain more precise indices, Bozdogan (1987) proposed the consistent Akaike information criterion (CAlC), and Sclove (1987) proposed the sample-size adjusted Bayesian information criterion (ssBIC). The deviance information criterion (DIC; Spiegelhalter, Best, Carlin, & Linde2002) is a recently developed criterion designed for hierarchical models. It is based on the posterior distribution of the log-likelihood and is useful in Bayesian model selection problems where the posterior distributions have been obtained by Markov chain Monte Carlo (MCMC) simulation. DIC is usually regarded as a generalization of AIC and BIC. It is defined analogously to AIC or BIC with a penalty term of the number equal to effective model parameters in Bayesian models. In practice, rough DIC (RDIC or DICV in some literature, e.g., Oldmeadow & Keith2011) is an approximation of DIC. The mathematical forms of AIC, BIC, CAIC, ssBIC, and DIC are closely related to each other. They all try to find a balance between the accuracy and the complexity of the fitting model. For all indices above, the model with a smaller criterion/index value is better supported by data.

Lu, Zhang, and Cohen (2013) proposed a series of Bayesian model selection indices based on the traditional ones. However, in Lu et al. (2013) the performances of these indices were investigated when data were non-mixture, normally distributed, and with simple non-ignorable missingness. And only latent growth models were used.

1.2 Goals and Structure

To address the challenges in model selection criterion/index in a Bayesian context, this paper proposes ten model selection indices. This paper also examines the performance of these indices under various conditions by conducting five simulation studies to cover different latent growth models, such as the robust growth models for non-normally distributed data, robust growth mixture models, and the extended robust growth mixture models with missing values. We consider latent growth models because they are very flexible in modeling complex data and becoming increasingly popular in statistical, psychological, behavioral, and educational areas.

The rest of the article consists of five sections. Section 2 presents and formulates three types of models we used in this paper: latent growth models (including growth curve models, growth mixture models, and extended growth mixture models), robust growth models (including three types of robust models), and models that account for missingness (we focus on non-ignorable missingness). Section 3 proposes ten model selection indices in the framework of Bayesian growth models with missing data. Section 4 conducts five simulation studies to evaluate the performance of the Bayesian indices. Model selection results are analyzed, summarized, and compared. Section 5 illustrates the application of these model selection indices in real data analysis. Section 6 discusses the implications and future directions of this study.

2 Latent Growth Models, Robust Growth Models, and Missing Values

Our investigation of the performance of the Bayesian selection indices involves fitting growth models to complex data. In this section, different types of growth models are briefly introduced. Given the fact that the data used in growth models are almost inevitably contain attrition (e.g., Little & Rubin2002Lu, Zhang, & Lubke2011Yuan & Lu2008) and outliers (e.g., Maronna, Martin, & Yohai2006), different types of growth models are developed, which include traditional latent growth curve models with missing data (Lu et al.2013), robust growth curve models (Zhang, Lai, Lu, & Tong2013) with missing data (Lu & Zhang2021), growth mixture models (e.g., Bartholomew & Knott1999) with missing data (Lu & Zhang2014), extended growth mixture models (EGMMs, Muthén & Shedden1999) with missing data (Lu & Zhang2014), and robust growth mixture models with missing data (Lu & Zhang2014).

In the following, we discuss three types of models: latent growth models (including growth curve models, growth mixture models, and extended growth mixture models), robust growth models (including three types of robust models), and models that account for missingness (we focus on non-ignorable missingness). By combining different elements of these models, it becomes possible to consider a series of growth models with a variety of missing data mechanisms and contaminated data.

2.1 Latent Growth Models

The mathematical form of a latent growth curve model is

{
  yi = Ληi + ei ,
  ηi = β+ ξi
(1)

where $\by _{i}$ is a $T\times 1$ vector of outcomes for participant $i \, (i=1,...,N)$, $\boldeta _{i}$ is a $q\times 1$ vector of latent effects, $\bLambda $ is a $T\times q$ matrix of factor loadings for $\boldeta _{i}$, $\be _{i}$ is a $T\times 1$ vector of residual or measurement errors, $\bbeta $ is a $q\times 1$ vector of fixed-effects, and $\bxi _{i}$ captures the variation of $\boldeta _{i}$. We have to note that $\be _{i}$ and $\bxi _{i}$ are usually assumed normally distributed but not necessary. When data have outliers and are heavy-tailed, this assumption might cause estimate biases. To reduce the effects of outliers, we consider robust models in this study.

A growth mixture model can be expressed as

       K
f(y) = ∑  π f (y ),
   i   k=1 k k  i
(2)

where $\pi _{k}$ is the invariant class probability (or weight) for class $k$ satisfying $0\leq \pi _{k}\leq 1$ and $\sum _{k=1}^K \pi _{k}=1$ (e.g., McLachlan & Peel2000), and $f_{_{k}}(\by _{i}) (k=1,\ldots , K)$ is the density of a latent growth model for class $k$.

For extended growth mixture models (EGMMs, Muthén & Shedden1999), $\pi _{k}$ is not invariant across individuals. It is allowed to vary individually depending on covariates, so it is expressed as $\pi _{ik}(\bx _i)$. If a probit link function is used, then

(
{πi1(xi) = Φ(Xi′φ1)
(πik(xi) = Φ(Xi′φk)- Φ(X ′i φk-1), (k = 2,3,...,K - 1),
 πiK (xi) = 1 - Φ(X′i φK -1)
(3)

where $\Phi (\cdot )$ is the cumulative distribution function (CDF) of the standard normal distribution, and $X_{i}=(1, \bx _{i}')'$ with an $r\times 1$ vector of observed covariates $\bx _{i}$. Note that $\Phi (X_{i}' \, \bvarphi _{k})=\sum _{j=1}^k \pi _{ij}(\bx _i)$ and $\Phi (X_{i}' \, \bvarphi _{K}) \equiv 1$.

A dummy variable $\bz _{i}=(z_{i1},z_{i2},...,z_{iK})'$ is used to indicate the class membership. If individual $i$ comes from group $k$, $z_{ik}=1$ and $z_{ij}=0$ $(\forall j\neq k)$. $\bz _{i}$ is multinomially distributed (McLachlan & Peel2000, p.7), that is, $\bz _{i} \sim MultiNomial(\pi _{i1},\pi _{i2},...,\pi _{iK})$.

2.2 Robust Growth Models

When data have outliers and are heavy-tailed, robust methods are used to reduce the effects of outliers. As t-distributions are more robust than normal distributions, the following are robust growth models (Lu & Zhang2021Zhang et al.2013).

(1) t-Normal (TN) model in which the measurement errors are t-distributed and the latent random effects are normally distributed,

{
 ei ~ M tT (0,Θ, ν),
 ξi ~ M Nq(0,Ψ)
(4)

where $Mt_{T}(\textbf {0}, \bTheta , \nu )$ is a $T$-dimensional multivariate t-distribution with a scale matrix $\bTheta $ and degrees of freedom $\nu $, and $MN_{q}(\textbf {0}, \bPsi )$ is a $q$-dimensional multivariate Normal distribution with a covariance matrix $\bPsi $.

(2) Normal-t (NT) model in which the measurement errors are normally distributed but the latent random effects are t-distributed,

{ ei ~ M NT (0,Θ )
  ξi ~ M tq(0,Ψ ,u) .
(5)

(3) t-t (TT) model in which both the measurement errors and the latent random effects are t-distributed,

{
 ei ~ M tT (0,Θ, ν).
 ξi ~ M tq(0,Ψ,u)
(6)

2.3 Missing Values

We focus on the non-ignorable missingness in this paper. To build models with non-ignorable missingness, selection models (Glynn, Laird, & Rubin1986Little19931995) are used. For individual $i$, let $\boldm _{i}=(m_{i1},m_{i2},...,m_{iT})'$ be a missing data indicator for $\by _{i}$, with $m_{it}=1$ when $y_{it}$ is missing and $0$ when observed. Let $\tau _{it}=p(m_{it}=1)$ be the probability that $y_{it}$ is missing. Then $m_{it}\sim \textrm {Bernoulli}(\tau _{it})$, so its density function is $f(m_{it})=\tau _{it}^{m_{it}}(1-\tau _{it})^{(1-m_{it})}$. The missingness probability $\tau _{it}$ can have different forms. Lu and Zhang (2014) proposed the following non-ignorable missingness mechanisms for mixture models.

(1) Latent-Class-Intercept-Dependent (LCID) missingness in which $\tau _{it}$ is a function of latent class, covariates, and latent individual initial levels. For example, students are more likely to miss a test if their starting levels of that course are low. We model it as follows:

τit = Φ(z′γ  + IiγIt + x′γ ),                (7)
        i zt         i xt

where $I_{i}$ is the latent initial levels for individual $i$, $\gamma _{It}$ is the coefficient for $I_{i}$, $\bgamma _{zt}$ is the coefficient for class membership, and $\bgamma _{xt}$ are coefficients for covariates. For non-mixture homogenous growth models, LCID can be simplified to Latent-Intercept-Dependent (LID) without the class membership indicator $\bz _{i}$ and expressed as $\tau _{it} = \Phi (\gamma _{0t} + I_{i}\gamma _{It} + \bx _{i}'\bgamma _{xt})$, where $\gamma _{0t}$ is the intercept.

(2) Latent-Class-Slope-Dependent (LCSD) missingness in which $\tau _{it}$ is a function of latent class, covariates, and latent individual slopes of growth. For example, students are more likely to miss a test if they have slow growth of the course. In this case, $\tau _{it}$ can be modeled as

τit = Φ(z′γ  + SiγSt + x′γ ),                 (8)
        izt          i xt

where $S_{i}$ is the latent slope for individual $i$, and $\gamma _{St}$ is the coefficient for $S_{i}$. Similarly, for non-mixture homogeneous growth models, LCSD is simplified to Latent-Slope-Dependent (LSD) case as $\tau _{it} =\Phi (\gamma _{0t} + S_{i}\gamma _{St} + \bx _{i}'\bgamma _{xt})$.

(3) Latent-Class-Outcome-Dependent (LCOD) missingness in which $\tau _{it}$ is a function of latent class, covariates, and potential outcomes that may be missing. For example, a student who feels he is not doing well on the test may be more likely to give up taking the rest of the test. We express $\tau _{it}$ as

τit = Φ(z′γ  + yitγyt + x′γ ),                 (9)
        izt          i xt

where $y_{it}$ is the potential outcomes for individual $i$ at time $t$, and $\gamma _{yt}$ is the coefficient for $y_{it}$. And LCOD can be simplified to Latent-Outcome-Dependent (LOD) for non-mixture homogeneous growth models with a probability of missingness $\tau _{it} =\Phi (\gamma _{0t} + y_{it} \gamma _{yt}+\bx _{i}'\bgamma _{xt})$.

In a more general framework, LCID and LCSD can be further encompassed into Latent-Class-Random Effect-Dependent missingness as intercept and slope are different random effects according to different situations under consideration. And for non-mixture structure, LID and LSD are encompassed into Latent-Random Effect-Dependent missingness.

3 Bayesian Model Selection Indices

In this section, we propose ten model selection criteria in the framework of Bayesian growth models with missing data. The definitions of the selection criteria are listed in Table 1. The model selection criteria in the table are based on two versions of deviance in the Bayesian context, $E_{D|y}[D(\theta )]$ and $D(E_{\theta |y}[\theta ])$. As we have discussed in the introduction section, $E_{\theta |y}[D]$ is the expected value of all the deviances, and $D(E_{\theta |y}[\theta ])$ is the deviance score based on the expected parameters. For different models, the detailed mathematical forms of these two deviances are different. In this paper, we focus on both homogeneous and heterogeneous latent growth models with non-ignorable missing data.



Table 1: Model Selection Indices

Index = Deviance +Penalty
Dbar.AIC$^1$ $\overline {D(\theta )}^4$ 2 p
Dbar.BIC$^2$ $\overline {D(\theta )}$ log(N) p
Dbar.CAIC$\overline {D(\theta )}$ (log(N)+1) p
Dbar.ssBIC$\overline {D(\theta )}$ log((N+2)/24) p
RDIC $\overline {D(\theta )}$ var(Dbar)/2
Dhat.AIC $D(\hat {\theta })^5$ 2 p
Dhat.BIC $D(\hat {\theta })$ log(N) p
Dhat.CAIC $D(\hat {\theta })$ (log(N)+1) p
Dhat.ssBIC$D(\hat {\theta })$ log((N+2)/24) p
DIC$^3$ $D(\hat {\theta })$ 2 pD

Note.

1. $p$ is the number of parameters.

2. $N$ is the sample size.

3. $pD=\overline {D(\theta )}-D(\hat {\theta })$.

4. $\overline {D(\theta )}$ is shown as in eqn.(10) for growth curve models and as in eqn.(13) for growth mixture models.

5. $D(\hat {\theta })$ is shown as in eqn.(12) for growth curve models and as in eqn.(14) for growth mixture models.


We first look at the homogeneous growth curve models with non-ignorable missing data. One version of deviance, $E_{D|y}[D(\theta )]$, is approximated by

            -----     ∑S ∑N ∑T
ED |y[D(θ)] ≈ D(θ) = --2        l(is)t (θ|y,m )
                    S s=1 i=1 t=1
                      ∑S ∑N ∑T [                      ]
                = - -2          (1 - m (is)t )l(ist) (y)+ l(sit)(m) , (10)
                    S s=1 i=1 t=1

where $S$ is the number of iterations for converged Markov chains, $l_{it}^{(s)}(\theta |y,m)$ = $log(L_{it}^{(s)}(\theta |y,m))$ is a conditional joint loglikelihood function (see, Celeux et al.2006) of $y$ and $m$, $m_{it}$ is the missing data indicator for individual $i$ at time $t$ with a likelihood function $l_{ikt}(m) = m_{it}log(\tau _{it})+(1-m_{it})log (1-\tau _{it})$, where $\tau _{it}$ is the missing data rate for individual $i$ at time $t$ and is defined differently for different missingness models as in the previous section. When $y_{it}$ is missing, the corresponding likelihood is excluded. So combining $y$ and $m$, the conditional likelihood function of a selection model with non-ignorable missing data can be expressed as

Lit(θ|y,m ) = [f(yit|ηi)(1 - τit)](1- mit) τimtit,           (11)

And the other version of deviance, $D(E_{\theta |y}[\theta ])$, is approximated by

                      N  T
D (E  [θ]) ≈ D(ˆθ) = - 2∑ ∑  [(1- m  )l(y|ˆθ)+ l(m |θˆ)],     (12)
    θ|y               i=1t=1       it it      it

where $\hat {\theta }$ is the posterior mean of parameter estimates across $S$ iterations.

For growth mixture models with missing data, $E_{\theta |y}[D]$ is expressed as

           -----    2 S∑  N∑  K∑     ∑T [                     ]
ED|y[D (θ)] ≈ D (θ) = --         z(sik)    (1- mit)l(isk)t(y)+ l(sik)t(m ) ,(13)
                    S s=1 i=1 k=1   t=1

where $\bz _{i}=(z_{i1},z_{i2},...,z_{iK})$ is the class membership indicator which follows a multinomial distribution, $\bz _{i} \sim MultiNomial(\pi _{i1},\pi _{i2},...,\pi _{iK})$, and $z_{ik}^{(s)}$ is the class membership estimated at iteration $s$. And

                     N  K     T [                        ]
D(E   [θ]) ≈ D (ˆθ) = - 2∑ ∑ zˆ ∑   (1- m  )l (y|ˆθ)+ l (m |ˆθ) , (14)
   θ|y              i=1k=1 ikt=1       it ikt      ikt

where $\hat {z}_{ik}$ is the posterior mode of class membership, $\hat {\theta }$ is the posterior mean of parameter estimates across all $S$ iterations. In both $\overline {D(\theta )}$ and $D(\hat {\theta })$ definitions of deviance, $l_{ikt}(y)$ and $l_{ikt}(m)$ are the conditional loglikelihood functions for $y_{it}$ and $m_{it}$, respectively, for individual $i$ in class $k$ at time $t$.

The difference between $\overline {D(\theta )}$ and $D(\hat {\theta })$ can be quantified by a statistic called pD (Spiegelhalter et al.2002),

     -----    ˆ
pD = D (θ)- D (θ).
(15)

Based on the Jensen’s inequality (Casella & George1992), when $D(\theta )$ is convex, then $\overline {D(\theta )} \geq D(\hat {\theta })$ and as a result pD is positive. When $D(\theta )$ is concave, then $\overline {D(\theta )} \leq D(\hat {\theta })$ and pD is negative.

4 Simulation Studies

In this section, five simulation studies are conducted to evaluate the performance of the Bayesian indices. For each study, four waves of complete data are generated first and then missing data are created on each occasion according to pre-designed missing data rates. After data are generated, full Bayesian methods are used by adopting uninformative priors, obtaining conditional posterior distributions through application of a data augmentation algorithm, generating Markov chains through a Gibbs sampling procedure, conducting convergence testing, and making statistical inference for model parameters. For all simulations, the software OpenBUGS is used for the implementation of Gibbs sampling, and R is used for data-generation, convergence testing, and parameter estimation.

The five studies are designed such that the data complexity increases from study 1 to study 5. Studies 1-2 focus on non-mixture growth data and thus, latent growth curve models with missing data are used. Studies 3-5 focus on mixture growth data and thus, growth mixture models with missing data are used. Simulation factors include measurement error distributions, random-effects distributions, missingness patterns, sample size, and class separation (Anderson & Bahadur1962). Under each condition, 100 converged replications are used to calculate the model selection proportion. Table 2 lists the design details.


Table 2: Simulation Study Design

Study 1 investigated the performance of the Bayesian indices when data were non-mixture, homogeneous, normally distributed with non-ignorable missingness. The true model was NN-XS, which was the model with normally distributed measurement errors ($\be _{i}$) at level 1 and random effects ($\bxi _i$) at level 2, with missingness depending on covariate $x$ and latent slope $S$. Specifically, $\be _{i} \sim MN(\textbf {0}, \bI )$, $\boldeta _{i} \sim MN_{q}(\bbeta ,\bPsi )$ where $\bbeta =\textrm {(Intercept, Slope)}=(1,3)$ and $\bPsi $ was a 2 by 2 symmetric matrix with $Var(I)=1$, $Cov(I,S)=0$, and $Var(S)=4$. For missingness, the bigger the latent slope was, the higher the missing data rate would be. The missingness probit coefficients were set as $\gamma _0=(-1,-1,-1,-1)$, $\gamma _x=(-1.5,-1.5,-1.5,-1.5)$, and $\gamma _S=(0.5,0.5,0.5,0.5)$. For example, if a participant had a latent growth slope 3, with a covariate value 1, then his or her missing probability at each wave was $\tau \approx 16\%$; if the slope was 5, with the same covariate value, the missing probability increased to $\tau =50\%$; but if the slope was 1, then the missing probability decreased to $\tau =2.3\%$. The covariate $x$ was also generated from a normal distribution, $x \sim N(1, sd=0.2)$. In study 1, in total there were 16 conditions with 4 missingness mechanisms (XS non-ignorable, XY non-ignorable, XI non-ignorable, and ignorable) combined with 4 levels of sample size (1000, 500, 300, and 200). Table 3 lists the model selection proportions across 100 replications for each of these indices across all conditions in study 1. The largest proportion across 4 missingness models is indicated in the shaded cell for each index. When sample size is relatively large, 1000 or 500, all of the model selection indices, except for the rough DIC (RDIC), correctly identify the true model with 100%. When sample size becomes smaller, 300 or 200, except for the RDIC, all of the model selection indices choose the true model with certainty above 93%. Comparing the indices defined based on Dbar with those defined based on Dhat, one can see that the former performs a little bit better.



Table 3: Model Selection Proportion in Study 1
N=1000
N=500
Non-ignorable
Ignorable
Non-ignorable
Ignorable
Criteron$^1$ NN-XS$^2$ NN-XY$^3$ NN-XI$^4$ NN$^5$ NN-XS NN-XY NN-XI NN
Dbar.AIC 1$^6$ 0.000 0.000 0.000 1 0.000 0.000 0.000
Dbar.BIC 1 0.000 0.000 0.000 1 0.000 0.000 0.000
Dbar.CAIC 1 0.000 0.000 0.000 1 0.000 0.000 0.000
Dbar.ssBIC 1 0.000 0.000 0.000 1 0.000 0.000 0.000
RDIC 0.013 0.000 0.987 0.000 0.038 0.000 0.962 0.000
Dhat.AIC 1 0.000 0.000 0.000 1 0.000 0.000 0.000
Dhat.BIC 1 0.000 0.000 0.000 1 0.000 0.000 0.000
Dhat.CAIC 1 0.000 0.000 0.000 1 0.000 0.000 0.000
Dhat.ssBIC 1 0.000 0.000 0.000 1 0.000 0.000 0.000
DIC 1 0.000 0.000 0.000 1 0.000 0.000 0.000
N=300
N=200
Dbar.AIC 0.98125 0.01875 0.000 0.000 0.975 0.025 0.000 0.000
Dbar.BIC 0.98125 0.01875 0.000 0.000 0.975 0.025 0.000 0.000
Dbar.CAIC 0.98125 0.01875 0.000 0.000 0.975 0.025 0.000 0.000
Dbar.ssBIC 0.98125 0.01875 0.000 0.000 0.975 0.025 0.000 0.000
Rough DIC 0.1125 0.000 0.8875 0.000 0.2 0.03125 0.76875 0.000
Dhat.AIC 0.95 0.05 0.000 0.000 0.9375 0.06875 0.000 0.000
Dhat.BIC 0.95 0.05 0.000 0.000 0.9375 0.06875 0.000 0.000
Dhat.CAIC 0.95 0.05 0.000 0.000 0.9375 0.06875 0.000 0.000
Dhat.ssBIC 0.95 0.05 0.000 0.000 0.9375 0.06875 0.000 0.000
DIC 1 0.000 0.000 0.000 0.98125 0.0125 0.00625 0.000
Note.

1. The definition of each index is given in Table 1.

2. The shaded model is the true model. The model is normal-distribution-based with
latent-slope-dependent missingness.

3. The model is normal-distribution-based with potential-outcome-dependent missingness.

4. The model is normal-distribution-based with latent-intercept-dependent missingness.

5. The model is normal-distribution-based with ignorable missingness.

6. The shaded cell has the largest proportion.


Study 2 investigated the performance of these indices when data were non-mixture homogeneous with outliers and non-ignorable missingness. The main difference between study 2 and 1 was that the data for study 2 contain outliers such that they are not normally distributed. So robust growth curve models were used. The true model was TN-XS, which means measurement errors ($\be _i$) at level 1 followed a t-distribution. Specifically, $\be _{i}$ were generated from a $t$ distribution with 5 degrees of freedom and a scale matrix $\bI $, i.e., $\be _{i} \sim Mt(\textbf {0}, \bI , 5)$. Other settings were kept the same as those in study 1. In this study, totally 32 conditions were considered with 4 data distributions (NN, TN, NT, and TT), 4 missingness patterns (XS non-ignorable, XY non-ignorable, XI non-ignorable, and ignorable), and 2 levels of sample size (1000 and 500). Table 4 lists the model selection proportions. The largest proportion across 16 missingness models is indicated in the shaded cells for each index. Except for the RDIC, all of the model selection indices correctly identify the true model. TT-XS is a competing model, which also gains high selection probabilities. This is because the normal distribution is almost identical to a t-distribution with large degrees of freedom. The degrees of freedom of t is also estimated by the model. Also, the Dbar-based indices performs a little bit better than the Dhat-based indices. Among them, Dbar-based BIC and CAIC perform best.



Table 4: Model Selection Proportion in Study 2
N=1000
N=500
Non-ignorable
Ignorable
Non-ignorable
Ignorable
Index XS$^5$ XY XI XS XY XI
Dbar.AIC TN$^1$ 0.5190.0000.000 0.000 0.5970.0130.000 0.000
TT$^2$ 0.4690.0000.000 0.012 0.3770.0000.000 0.000
NT$^3$ 0.0000.0000.000 0.000 0.0060.0000.000 0.000
NN$^4$0.0000.0000.000 0.000 0.0060.0000.000 0.000
Dbar.BIC TN 0.7810.0000.000 0.000 0.8550.0130.000 0.000
TT 0.2000.0000.000 0.019 0.1130.0000.000 0.000
NT 0.0000.0000.000 0.000 0.0060.0000.000 0.000
NN0.0000.0000.000 0.000 0.0130.0000.000 0.000
Dbar.CAICTN 0.8190.0000.000 0.000 0.8880.0120.000 0.000
TT 0.1620.0000.000 0.019 0.0750.0000.000 0.000
NT 0.0000.0000.000 0.000 0.0000.0000.000 0.000
NN0.0000.0000.000 0.000 0.0190.0000.000 0.000
Dbar.ssBICTN 0.6250.0000.000 0.000 0.6310.0120.000 0.000
TT 0.3620.0000.000 0.012 0.3380.0000.000 0.000
NT 0.0000.0000.000 0.000 0.0060.0000.000 0.000
NN0.0000.0000.000 0.000 0.0060.0000.000 0.000
RDIC TN 0.0000.0000.106 0.000 0.0000.0000.094 0.000
TT 0.0000.0000.100 0.000 0.0000.0000.113 0.000
NT 0.0000.0000.394 0.000 0.0000.0000.390 0.000
NN0.0000.0000.400 0.000 0.0000.0000.403 0.000
Dhat.AIC TN 0.5440.0000.000 0.000 0.5470.0250.000 0.000
TT 0.5060.0060.000 0.000 0.4470.0190.000 0.000
NT 0.0000.0000.000 0.000 0.0000.0000.000 0.000
NN0.0000.0000.000 0.000 0.0000.0000.000 0.000
Dhat.BIC TN 0.6750.0060.000 0.000 0.7170.0250.000 0.000
TT 0.3190.0000.000 0.000 0.2450.0130.000 0.000
NT 0.0000.0000.000 0.000 0.0000.0000.000 0.000
NN0.0000.0000.000 0.000 0.0000.0000.000 0.000
Dhat.CAIC TN 0.7000.0060.000 0.000 0.7880.0250.000 0.000
TT 0.2940.0060.000 0.000 0.1690.0120.000 0.000
NT 0.0000.0000.000 0.000 0.0000.0000.000 0.000
NN0.0000.0000.000 0.000 0.0000.0000.000 0.000
Dhat.ssBICTN 0.5750.0060.000 0.000 0.5880.0250.000 0.000
TT 0.4190.0060.000 0.000 0.3690.0120.000 0.000
NT 0.0000.0000.000 0.000 0.0000.0000.000 0.000
NN0.0000.0000.000 0.000 0.0000.0000.000 0.000
DIC TN 0.3250.0000.000 0.000 0.4150.0060.000 0.000
TT 0.4620.0000.000 0.194 0.4090.0000.000 0.000
NT 0.0120.0000.000 0.000 0.0880.0000.000 0.000
NN0.0060.0000.000 0.000 0.0820.0000.000 0.000

Note. $^{1-4}$T-Normal, T-T, Normal-T, and Normal-Normal measurement errors and random effects. $^5$Other abbreviations are as given in Table 3.

Study 3 was designed for mixture data with outliers and non-ignorable missing data. As data were mixture, growth mixture models were used. In this study, the true model was 2-class mixture TN-XS RGMM. Only intercepts of these 2 classes were different, with 5 for class 1 and 1 for class 2. Other settings for each class were the same as in study 2. Both classes have $t_5$ distributed measurement errors. Based on Anderson and Bahadur (1962), the class separation is around $2.7$. In this study, we assumed they are traditional mixture models, i.e., class probabilities were fixed at (50%, 50%) in this study. The same as in study 2, there were 32 conditions considered with 4 data distributions (NN, TN, NT, and TT), 4 missingness patterns (XS non-ignorable, XY non-ignorable, XI non-ignorable, and ignorable), and 2 levels of sample size (1000 and 1500). As mixture data require more data to obtain estimates, we increased the sample size. Table 5 shows the results for study 3. The shaded cell indicates the largest proportion across 16 missingness models for each index. Again, almost all of the model selection indices correctly identify the true model. And the Dbar-based indices perform a little bit better than the Dhat-based indices. Specifically, Dbar-based BIC and CAIC perform best among these indices, and then Dbar-based ssBIC also perform well.



Table 5: Model Selection Proportion in Study 3
N=1500
N=1000
Non-ignorable
Ignorable
Non-ignorable
Ignorable
Index XS XY XI XS XY XI
Dbar.AIC TN 0.6210.0000.000 0.000 0.5930.0000.000 0.000
TT 0.3570.0000.000 0.000 0.3140.0000.000 0.000
NT 0.0000.0000.000 0.000 0.0210.0000.000 0.000
NN0.0210.0000.000 0.000 0.0710.0000.000 0.000
Dbar.BIC TN 0.8640.0000.000 0.000 0.8430.0000.000 0.000
TT 0.1140.0000.000 0.000 0.0640.0000.000 0.000
NT 0.0000.0000.000 0.000 0.0140.0000.000 0.000
NN0.0210.0070.000 0.000 0.0790.0000.000 0.000
Dbar.CAICTN 0.8930.0000.000 0.000 0.8570.0000.000 0.000
TT 0.0790.0000.000 0.000 0.0430.0000.000 0.000
NT 0.0000.0000.000 0.000 0.0070.0070.000 0.000
NN0.0210.0070.000 0.000 0.0860.0000.000 0.000
Dbar.ssBICTN 0.7290.0000.000 0.000 0.7500.0000.000 0.000
TT 0.2500.0000.000 0.000 0.1570.0000.000 0.000
NT 0.0000.0000.000 0.000 0.0140.0000.000 0.000
NN0.0210.0070.000 0.000 0.0790.0000.000 0.000
RDIC TN 0.0710.0000.000 0.000 0.1430.0000.000 0.000
TT 0.0860.0000.000 0.000 0.0710.0000.000 0.000
NT 0.4500.0000.000 0.000 0.3930.0070.000 0.000
NN0.3930.0000.000 0.000 0.3790.0070.000 0.000
Dhat.AIC TN 0.5860.0000.000 0.000 0.6210.0000.000 0.000
TT 0.3790.0000.000 0.000 0.3290.0000.000 0.000
NT 0.0140.0000.000 0.000 0.0140.0070.000 0.000
NN0.0140.0070.000 0.000 0.0570.0000.000 0.000
Dhat.BIC TN 0.7570.0000.000 0.000 0.7930.0000.000 0.000
TT 0.2070.0000.000 0.000 0.1210.0000.000 0.000
NT 0.0070.0000.000 0.000 0.0070.0070.000 0.000
NN0.0210.0070.000 0.000 0.0710.0000.000 0.000
Dhat.CAIC TN 0.7570.0000.000 0.000 0.8140.0000.000 0.000
TT 0.2070.0000.000 0.000 0.1000.0000.000 0.000
NT 0.0070.0000.000 0.000 0.0070.0070.000 0.000
NN0.0210.0070.000 0.000 0.0710.0000.000 0.000
Dhat.ssBICTN 0.5860.0000.000 0.000 0.6640.0000.000 0.000
TT 0.3790.0000.000 0.000 0.2500.0000.000 0.000
NT 0.0140.0000.000 0.000 0.0140.0070.000 0.000
NN0.0140.0070.000 0.000 0.0640.0000.000 0.000
DIC TN 0.5070.0000.000 0.000 0.3640.0070.000 0.000
TT 0.3710.0000.000 0.000 0.2860.0000.000 0.000
NT 0.0430.0360.000 0.000 0.1290.0290.007 0.000
NN0.0430.0000.000 0.000 0.1500.0290.000 0.000

Note. Same as Table 3.

Study 4 extended study 3 such that the class probabilities were not fixed. Instead, they depended on values of covariates. Also, the non-ignorable missingness in this study was allowed to depend on the corresponding observations’ latent class membership. The true model in this study was 2-class mixture TN-CXS robust extended growth mixture models (REGMM). The differences between this study and study 3 were (1) the class proportions in this study were predicted by the value of covariate $x$; (2) the missing data rates were predicted by the latent class membership; and (3) both medium size, 2.7, and small size, 1.7, class separations were used. Specifically, for small class separation, the intercept for class 1 was 3.5 and the intercept for class 2 was 1. To simplify the simulation, based on the findings in study 3, 5 competing mixture models (TN-CXS, TT-CXS, TN-CX, NN-CXS, and NN-CX) were chosen to fit the data. Totally, we considered 20 conditions with 5 mixture models, 2 levels of sample size (1500 and 1000), and 2 levels of class separation (2.7 and 1.7). Table 6 shows the model selection proportions in study 4. Again, almost all of the model selection indices correctly identify the true model. Specifically, Dbar-based BIC and CAIC perform best among these indices.



Table 6: Model Selection Proportion in Study 4
Index TN-CXSTT-CXSNN-CXSTN-CXNN-CX
Class Separation=2.7, N=1500
Dbar.AIC 0.567 0.425 0.000 0.008 0.000
Dbar.BIC 0.808 0.158 0.000 0.033 0.000
Dbar.CAIC 0.850 0.108 0.000 0.0042 0.000
Dbar.ssBIC 0.667 0.300 0.000 0.033 0.000
RDIC 0.042 0.042 0.908 0.000 0.008
[1pt] Dhat.AIC 0.475 0.392 0.000 0.133 0.000
Dhat.BIC 0.550 0.233 0.000 0.217 0.000
Dhat.CAIC 0.525 0.233 0.000 0.242 0.000
Dhat.ssBIC 0.467 0.367 0.000 0.167 0.000
DIC 0.467 0.500 0.033 0.000 0.000
Class Separation=2.7, N=1000
Dbar.AIC 0.558 0.375 0.000 0.067 0.000
Dbar.BIC 0.750 0.125 0.000 0.125 0.000
Dbar.CAIC 0.767 0.100 0.008 0.125 0.000
Dbar.ssBIC 0.633 0.292 0.000 0.075 0.000
RDIC 0.092 0.075 0.808 0.000 0.025
[1pt] Dhat.AIC 0.350 0.358 0.000 0.292 0.000
Dhat.BIC 0.450 0.175 0.000 0.375 0.000
Dhat.CAIC 0.442 0.150 0.000 0.4 0.008
Dhat.ssBIC 0.392 0.300 0.000 0.308 0.000
DIC 0.417 0.450 0.108 0.008 0.017
Class Separation=1.7, N=1500
Dbar.AIC 0.512 0.444 0.044 0.000 0.00
Dbar.BIC 0.744 0.212 0.044 0.000 0.00
Dbar.CAIC 0.781 0.175 0.044 0.000 0.00
Dbar.ssBIC 0.612 0.344 0.044 0.000 0.00
RDIC 0.306 0.238 0.350 0.006 0.10
[1pt] Dhat.AIC 0.475 0.475 0.031 0.019 0.00
Dhat.BIC 0.712 0.238 0.031 0.019 0.00
Dhat.CAIC 0.712 0.238 0.031 0.019 0.00
Dhat.ssBIC 0.475 0.475 0.031 0.019 0.00
DIC 0.381 0.450 0.169 0.000 0.00
Class Separation=1.7, N=1000
Dbar.AIC 0.550 0.400 0.050 0.000 0.000
Dbar.BIC 0.719 0.194 0.081 0.006 0.000
Dbar.CAIC 0.750 0.162 0.081 0.006 0.000
Dbar.ssBIC 0.638 0.300 0.062 0.000 0.000
RDIC 0.244 0.256 0.362 0.000 0.138
[1pt] Dhat.AIC 0.694 0.231 0.012 0.062 0.000
Dhat.BIC 0.644 0.294 0.012 0.050 0.000
Dhat.CAIC 0.694 0.231 0.012 0.062 0.000
Dhat.ssBIC 0.575 0.388 0.012 0.025 0.000
DIC 0.344 0.331 0.319 0.000 0.006

Note. Same as Table 3.

Study 5 focused on the number of classes. In this study, different growth curve models with different numbers of classes were fitted and compared. In total, 9 conditions were considered, including 3 models (TN-XS, TT-XS, NN-XS) and 3 numbers of classes (1, 2, and 3). The true model was the 2-class mixture TN-XS model. The simulation results for study 5 were presented in Table 7. Among these indices, Dhat-based indices perform better than Dhbar-based indices. Specifically, Dhat-based BIC and CAIC perform best, and ssBIC and AIC also provide high certainty.



Table 7: Model Selection Proportion in Study 5
2 CLASSES
1 CLASS
3 CLASSES
Index TN-XSTT-XSNN-XSTN-XSTT-XSNN-XSTN-XSTT-XSNN-XS
Dbar.AIC 0.000 0.000 0.057 0.393 0.129 0.000 0.021 0.007 0.393
Dbar.BIC 0.000 0.000 0.036 0.821 0.064 0.000 0.000 0.000 0.079
Dbar.CAIC 0.000 0.000 0.036 0.864 0.043 0.000 0.000 0.000 0.057
Dbar.ssBIC 0.000 0.000 0.057 0.593 0.100 0.000 0.000 0.000 0.25
RDIC 0.036 0.014 0.2 0.014 0.014 0.679 0.014 0.014 0.014
Dhat.AIC 0.621 0.343 0.064 0.000 0.000 0.000 0.000 0.000 0.000
Dhat.BIC 0.793 0.136 0.071 0.000 0.000 0.000 0.000 0.000 0.000
Dhat.CAIC 0.814 0.114 0.071 0.000 0.000 0.000 0.000 0.000 0.000
Dhat.ssBIC 0.664 0.264 0.071 0.000 0.000 0.000 0.000 0.000 0.000
DIC 0.000 0.000 0.000 0.164 0.193 0.121 0.000 0.000 0.521

Note. Same as Table 3.

5 Application

In this section, a real data set on mathematical growth is analyzed to demonstrate the application of the indices. The same sample that has been analyzed in Lu et al. (2011) is used here. It is a mathematical ability growth sample from the NLSY97 survey (Bureau of Labor Statistics, U.S. Department of Labor1997), which were collected from $N=1510$ adolescents yearly from 1997 to 2001 when each adolescent was administered the Peabody Individual Achievement Test (PIAT) Mathematics Assessment to measure their mathematical ability. There are some outliers at all five grades. Lu et al. (2011) conducted a power transformation to normalize the sample and assumed the data are normally distributed without outliers. In this study, however, we use the original non-transformed data with outliers, but robust methods are used. Also, different non-ignorable missingness mechanisms are considered. Overall, the means of mathematical ability increased over time with a roughly linear trend. The missing data rates range from 4.57% to 9.47%, and the raw data show the missing pattern is intermittent. About half of the sample is female.



Table 8: Steps and Fitting Models in Real Data Analysis
Step 1:
Fit a tentative 2 classes model, and check
the estimated df of t
$\be _{i}$
$\boldeta _{i}$
missingness









Model NT NT CXISY
TT-ignorable $\checkmark $ $\checkmark $
Step 2:
Try models with different missingness and
number of classes
2 Classes RGMMs
TN-X $\checkmark $ $\checkmark $ $\checkmark $
TN-XI $\checkmark $ $\checkmark $ $\checkmark $ $\checkmark $
TN-XS $\checkmark $ $\checkmark $ $\checkmark $ $\checkmark $
TN-XY $\checkmark $ $\checkmark $ $\checkmark $ $\checkmark $
2 Classes REGMMs
TN-CX $\checkmark $ $\checkmark $ $\checkmark $ $\checkmark $
TN-CXI $\checkmark $ $\checkmark $ $\checkmark $ $\checkmark $ $\checkmark $
TN-CXS $\checkmark $ $\checkmark $ $\checkmark $ $\checkmark $ $\checkmark $
TN-CXY $\checkmark $ $\checkmark $ $\checkmark $ $\checkmark $ $\checkmark $
3 Classes GMMs
NN-X $\checkmark $ $\checkmark $ $\checkmark $
4 Classes GMMs
NN-X $\checkmark $ $\checkmark $ $\checkmark $
Step 3:
Compare selection indices
Step 4:
Interpret results obtained from the selected
model
Note. Abbreviations are as given in Table 2.


Table 9: Model Selection in Real Data Analysis
2 CLASSES
3 CLASSES4 CLASSES
Index$^1$ TN-CXSTN-CXY TN-CXI TN-CX TN-XS TN-XY TN-XI TN-X NN-X NN-X
Dbar.AIC 17392 17472 17502 17502 17392 17482 17502 17512 17372 17126
Dbar.BIC 17583.52 17663.5217693.5217666.9217556.9217646.9217666.9217650.32 17536.92 17328.15
Dbar.CAIC 17619.52 17699.5217729.5217697.9217587.9217677.9217697.9217676.32 17567.92 17366.15
Dbar.ssBIC 17469.15 17549.1517579.1517568.4417458.4417548.4417568.4417567.72 17438.44 17207.44
RDIC 22759.24 22704.522378.1422601.2822562.6522755.4422973.5222520.18 22843.52 23333.2
Dhat.AIC 15192 14942 17482 19822 21922 23622 25722 27352 15872 15716
Dhat.BIC 15383.52 15133.5217673.5219986.9222086.9223786.9225886.9227490.32 16036.92 15918.15
Dhat.CAIC 15419.52 15169.5217709.5220017.9222117.9223817.9225917.9227516.32 16067.92 15956.15
Dhat.ssBIC 15269.15 15019.1517559.1519888.4421988.4423688.4425788.4427407.72 15938.44 15797.44
DIC 19520 19930 17450 15120 12800 11280 9220 7620 18810 18460

Note. $^1$The definition of each index is given in Table 1. $^2$The shaded cell has the smallest value.




Table 10: Estimates of TN-CXY REGMM in Real Data Analysis

The analysis is conducted following the steps in Table 8. In step 1, a tentative model (the TT-ignorable model) is fitted to the data. Gender is a covariate. The estimates of degrees of freedom of t for both classes are $2.342$ and $3.263$ for measurement errors and $75.65$ and $50.96$ for random effects, which indicates that measurement errors can be better fitted using a t distribution while random effects are approximately normally distributed (i.e., a TN model). And then in step 2, to compare models with different non-ignorable missingness and numbers of classes, 10 models are fitted to the data. During estimation we use uninformative priors which carry little information for model parameters. A burn-in period is run first to ensure estimates are based on the Markov chains that have converged. For testing convergence, the history plot is examined and the Geweke’s z statistic (Geweke1992) is checked for each parameter. The Geweke’s z statistics for all the parameters are smaller than $1.96$, which indicates converged Markov chains. To make sure all the parameters are estimated accurately, the next $50,000$ iterations are then saved for data analysis. The ratio of Monte Carlo error (MCerror) to standard deviation (S.D.) for each parameter is smaller than or close to 0.05, which indicates parameter estimates are accurate (Spiegelhalter, Thomas, Best, & Lunn2003). In step 3, model selection indices are used to compare the ten models. The indices are listed in Table 9. And in step 4, the results obtained from the final selected model are interpreted.

As suggested by Dhat.CAIC, Dhat.ssBIC, Dhat.BIC, and Dhat.AIC, without further substantive information, the TN-CXY model appears to be a good candidate for the best-fitting model. Table 10 provides the results of the TN-CXY REGMM model. It can be seen that (1) class 1 has a higher average initial level but a smaller average slope; (2) class 2 has larger variations for initial levels and slope; (3) the residual variance of class 2 is much larger than that of class 1; (4) in class 1 the initial level and the slope are significantly negatively correlated at the confidence level of 95%; (5) the missingness is not related to gender because none of the coefficients of gender are significant at the $\alpha $ level of $0.05$; (6) at grade 11, adolescents in class 2 are more likely to miss tests than those in class 1 because the probit coefficient of class membership for grade 11 is significantly positive; and (7) at grades 8 and 10, students with higher potential scores are more likely to miss tests than the students having lower scores because the probit coefficients of the potential outcomes $y$ at the two grades are significantly negative.

6 Conclusions, Discussion and Future Research

Based on the results from the five simulation studies, one can conclude that (1) almost all of the model selection indices, except for the rough DIC (RDIC), can correctly choose the true model with high certainty; (2) if the number of classes is correctly identified, then the Dbar-based indices perform better than the Dhat-based indices; if candidate models have different numbers of classes, then the Dhat-based indices might be used to select the best fit model; (3) across 5 studies, CAIC and BIC provide higher probabilities than those ssBIC, AIC, or DIC does. The results will help inform the selection of growth models by researchers seeking to provide people with accurate estimates of growth across a variety of possible contexts. The real data analysis demonstrated the application of the indices to typical longitudinal growth studies such as educational, psychological, and social research.

This study can be extended in many ways. For example, different versions of the likelihood function or more model selection indices can be studied and compared by using more practical statistical models. (1) As we stated in the section of Introduction, there are at least three challenges in proposing new selection indices. The third challenge is about the likelihood function $l(y|\hat {\theta })$. When latent variables involved, the likelihood can be an observed-data likelihood, a complete-data likelihood, or a conditional likelihood (Celeux et al.2006). In this study, we use a conditional joint loglikelihood, but in the future, the other versions of likelihood functions can be investigated. (2) Another future research of this study is to propose other model selection indices, such as Bayes factors. (3) This study focuses on latent growth models only. In the future, the performance of these selection indices can be studied by using other statistical models, such as survival models.

Acknowledgment

The study was supported by a grant from the Institute of Education Sciences (R305D210023).

References

   Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 1919 (6), 716-723. doi: https://doi.org/10.1007/978-1-4612-1694-0_16

   Anderson, T. W., & Bahadur, R. R. (1962). Classification into two multivariate normal distributions with different covariance matrices. The Annals of Mathematical Statistics, 33, 420-431. doi: https://doi.org/10.1214/aoms/1177704568

   Bartholomew, D. J., & Knott, M. (1999). Latent variable models and factor analysis: Kendall’s library of statistics (2nd ed., Vol. 7). New York, NY: Edward Arnold.

   Bozdogan, H. (1987). Model selection and Akaike’s Information Criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52, 345-370. doi: https://doi.org/10.1007/bf02294361

   Bureau of Labor Statistics, U.S. Department of Labor. (1997). National longitudinal survey of youth 1997 cohort, 1997-2003 (rounds 1-7). [computer file]. Produced by the National Opinion Research Center, the University of Chicago and distributed by the Center for Human Resource Research, The Ohio State University. Columbus, OH: 2005. Retrieved from http://www.bls.gov/nls/nlsy97.htm

   Casella, G., & George, E. I. (1992). Explaining the Gibbs sampler. The American Statistician, 46(3), 167-174. doi: https://doi.org/10.2307/2685208

   Celeux, G., Forbes, F., Robert, C., & Titterington, D. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 4, 651-674. doi: https://doi.org/10.1214/06-ba122

   Dunson, D. B. (2000). Bayesian latent variable models for clustered mixed outcomes. Journal of the Royal Statistical Society, B, 62, 355-366. doi: https://doi.org/10.1111/1467-9868.00236

   Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In J. M. Bernado, J. O. Berger, A. P. Dawid, & A. F. M. Smith (Eds.), Bayesian statistics 4 (p. 169-193). Oxford, UK: Clarendon Press.

   Glynn, R. J., Laird, N. M., & Rubin, D. B. (1986). Drawing inferences from self-selected samples. In H. Wainer (Ed.), (p. 115-142). New York: Springer Verlag.

   Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88, 125-134. doi: https://doi.org/10.2307/2290705

   Little, R. J. A. (1995). Modelling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association, 90, 1112-1121. doi: https://doi.org/10.1080/01621459.1995.10476615

   Little, R. J. A., & Rubin, D. B.  (2002). Statistical analysis with missing data (2nd ed.). New York, N.Y.: Wiley-Interscience. doi: https://doi.org/10.1002/9781119013563

   Lu, Z., & Zhang, Z. (2014). Robust growth mixture models with non-ignorable missingness data: Models, estimation, selection, and application. manuscript submitted for publication. Computational Statistics and Data Analysis, 71, 220–240. doi: https://doi.org/10.1016/j.csda.2013.07.036

   Lu, Z., & Zhang, Z. (2021). Bayesian approach to non-ignorable missingness in latent growth models. Journal of Behavioral Data Science, 1(2), 1–30. doi: https://doi.org/10.35566/jbds/v1n2/p1

   Lu, Z., Zhang, Z., & Cohen, A. (2013). New developments in quantitative psychology. In R. E. Millsap, L. A. van der Ark, D. M. Bolt, & C. M. Woods (Eds.), (Vol. 66, p. 275-304). Springer New York.

   Lu, Z., Zhang, Z., & Lubke, G. (2011). Bayesian inference for growth mixture models with latent-class-dependent missing data. Multivariate Behavioral Research, 46, 567-597. doi: https://doi.org/10.1080/00273171.2011.589261

   Maronna, R. A., Martin, R. D., & Yohai, V. J. (2006). Robust statistics: Theory and methods. John Wiley & Sons, Inc. doi: https://doi.org/10.1002/0470010940

   McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York, NY: John Wiley & Sons. doi: https://doi.org/10.1002/0471721182

   Muthén, B., & Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics, 55(2), 463-469. doi: https://doi.org/10.1111/j.0006-341x.1999.00463.x

   Oldmeadow, C., & Keith, J. M. (2011). Model selection in Bayesian segmentation of multiple DNA alignments. Bioinformatics, 27, 604-610. doi: https://doi.org/10.1093/bioinformatics/btq716

   Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics, 6 (2), 461-464. doi: https://doi.org/10.1214/aos/1176344136

   Sclove, L. S. (1987). Application of mode-selection criteria to some problems in multivariate analysis. Psychometrics, 52, 333-343. doi: https://doi.org/10.1007/bf02294360

   Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Linde, A. v. d. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 583-639. doi: https://doi.org/10.1111/1467-9868.00353

   Spiegelhalter, D. J., Thomas, A., Best, N., & Lunn, D. (2003). WinBUGS manual Version 1.4.

(MRC Biostatistics Unit, Institute of Public Health, Robinson Way, Cambridge CB2 2SR, UK, http://www.mrc-bsu.cam.ac.uk/bugs)

   Yuan, K.-H., & Lu, Z. (2008). SEM with missing data and unknown population using two-stage ML: theory and its application. Multivariate Behavioral Research, 43, 621-652. doi: https://doi.org/10.1080/00273170802490699

   Zhang, Z., Lai, K., Lu, Z., & Tong, X. (2013). Bayesian inference and application of robust growth curve models using student’s t distribution. Structural Equation Modeling, 20(1), 47-78. doi: https://doi.org/10.1080/10705511.2013.742382