Journal of Behavioral Data Science
https://jbds.isdsa.org/jbds
<p><strong>ISSN: 2575-8306 (Print)</strong><br /><strong>ISSN: 2574-1284 (Online)</strong><br /><strong>DOI: <a class="urlextern" title="https://dx.doi.org/10.35566/jbds" href="https://dx.doi.org/10.35566/jbds" rel="nofollow">10.35566/jbds</a></strong></p> <p>The Journal of Behavioral Data Science is a peer-reviewed, open-access journal that aims to provide a free-of-charge-to-publish platform for researchers and practitioners in the area of data science and data analytics. The journal is committed to making high-quality research freely accessible to authors and readers. Publishing in the journal and accessing its content are completely free for both authors and readers. This allows for the widest possible dissemination of research, and promotes interdisciplinary collaboration and innovation in the field of behavioral data science.</p>International Society for Data Science and AnalyticsenJournal of Behavioral Data Science2574-1284A Tutorial on Bayesian Linear Regression with Compositional Predictors Using JAGS
https://jbds.isdsa.org/jbds/article/view/72
<div id="magicparlabel-2063043" class="abstract">This tutorial offers an exploration of advanced Bayesian methodologies for compositional data analysis, specifically the Bayesian Lasso and Bayesian Spike-and-Slab Lasso (SSL) techniques. Our focus is on a novel Bayesian methodology that integrates Lasso and SSL priors, enhancing both parameter estimation and variable selection for linear regression with compositional predictors. The tutorial is structured to streamline the learning process, breaking down complex analyses into a series of straightforward steps. We demonstrate these methods using R and JAGS, employing simulated datasets to illustrate key concepts. Our objective is to provide a clear and comprehensive understanding of these sophisticated Bayesian techniques, preparing readers to adeptly navigate and apply these methods in their own compositional data analysis endeavors.</div>
TutorialsBayesian analysisCompositional dataLassoSpike and slab lassoYunli LiuXin Tong
Copyright (c) 2024 Journal of Behavioral Data Science
2024-01-282024-01-2812410.35566/jbds/tongliuConducting Meta-analyses of Proportions in R
https://jbds.isdsa.org/jbds/article/view/60
<p>Meta-analysis of proportions has been widely adopted across various scientific disciplines as a means to estimate the prevalence of phenomena of interest. However, there is a lack of comprehensive tutorials demonstrating the proper execution of such analyses using the R programming language. The objective of this study is to bridge this gap and provide an extensive guide to conducting a meta-analysis of proportions using R. Furthermore, we offer a thorough critical review of the methods and tests involved in conducting a meta-analysis of proportions, highlighting several common practices that may yield biased estimations and misleading inferences. We illustrate the meta-analytic process in five stages: (1) preparation of the R environment; (2) computation of effect sizes; (3) quantification of heterogeneity; (4) visualization of heterogeneity with the forest plot and the Baujat plot; and (5) explanation of heterogeneity with moderator analyses. In the last section of the tutorial, we address the misconception of assessing publication bias in the context of meta-analysis of proportions. The provided code offers readers three options to transform proportional data (e.g., the double arcsine method). The tutorial presentation is conceptually oriented and formula usage is minimal. We will use a published meta-analysis of proportions as an example to illustrate the implementation of the R code and the interpretation of the results.</p>
TutorialsMeta-analysis of proportionsHeterogeneityMeta-regressionDouble arcsine transformationBaujat plotNaike Wang
Copyright (c) 2023 Journal of Behavioral Data Science
2023-11-072023-11-076412610.35566/jbds/v3n2/wangRobust Bayesian growth curve modeling: A tutorial using JAGS
https://jbds.isdsa.org/jbds/article/view/67
<p>Latent growth curve models (LGCM) are widely used in longitudinal data analysis, and robust methods can be used to model error distributions for non-normal data. This tutorial introduces how to model<br />linear, non-linear, and quadratic growth curve models under the Bayesian framework and uses examples to illustrate how to model errors using t, exponential power, and skew-normal distributions. The code of JAGS models is provided and implemented by the R package runjags. Model diagnostics and comparisons are briefly discussed.</p>
TutorialsRobust Growth Curve ModelingBayesian EstimationStructural Equation ModelingJAGSRuoxuan Li
Copyright (c) 2023 Journal of Behavioral Data Science
2023-09-242023-09-24436310.35566/jbds/v3n2/liLasso and Group Lasso with Categorical Predictors: Impact of Coding Strategy on Variable Selection and Prediction
https://jbds.isdsa.org/jbds/article/view/64
<p>Machine learning methods are being increasingly adopted in behavioral research. Lasso regression performs variable selection and regularization, and is particularly appealing to behavioral researchers because of its connection to linear regression. Researchers may expect properties of linear regression to translate to lasso, but we demonstrate that this assumption is problematic for models with categorical predictors. Specifically, we demonstrate that while the coding strategy used for categorical predictors does not impact the performance of linear regression, it does impact lasso’s performance. Group lasso is an alternative to lasso for models with categorical predictors. We investigate the discrepancy between lasso and group lasso models using a real data set: lasso performs different variable selection and has different prediction accuracy depending on the coding strategy, while group lasso performs consistent variable selection but has different prediction accuracy. Using a Monte Carlo simulation, we demonstrate a specific case where group lasso tends to include many variables when few are needed, leading to overfitting. We conclude with recommended solutions to this issue and future directions of exploration to improve the implementation of machine learning approaches in behavioral science. This project shows that when using lasso and group lasso with categorical predictors, the choice of coding strategy should not be ignored.</p>
Theory and MethodsLasso regressionCategorical predictorsRegularizationYihuan HuangTristan TibbeAmy TangAmanda Montoya
Copyright (c) 2023 Journal of Behavioral Data Science
2024-01-262024-01-26154210.35566/jbds/v3n2/montoyaConsidering the Distributional Form of Zeroes When Calculating Mediation Effects with Zero-Inflated Count Outcomes
https://jbds.isdsa.org/jbds/article/view/58
<p>Recent work has demonstrated how to calculate conditional mediated effects for mediation models with zero-inflated count outcomes in a non-causal framework (O’Rourke & Vazquez, 2019); however, those formulas do not distinguish between logistic and count portions of the data distribution when calculating mediated effects separately for zeroes and counts. When calculating conditional mediated effects for the counts in a zero-inflated count outcome Y, the <em>b</em> path should use the partial derivative of the log-linear regression equation for X and M predicting Y. When calculating conditional mediated effects for the zeroes, the <em>b</em> path should use the partial derivative of the logistic regression equation for X and M predicting Y instead of the log-linear equation. This paper presents adjustments to the analytical formulas of conditional mediated effects for mediation with zero-inflated count outcomes when zeroes and counts are differentially predicted. Using a Monte Carlo simulation, we also empirically show that these adjustments produce different results than when the distributional form of zeroes is ignored.</p>
Theory and MethodsMediation analysisCount outcomesZero-inflationZIPZINBHurdle modelsHolly O'RourkeDa Eun Han
Copyright (c) 2023 Journal of Behavioral Data Science
2023-11-102023-11-1011410.35566/jbds/v3n2/orourkeAPI Face Value
https://jbds.isdsa.org/jbds/article/view/57
<p>Emotion recognition application programming interface (API) is a recent advancement in computing technology that synthesizes computer vision, machine-learning algorithms, deep-learning neural networks, and other information to detect and label human emotions. The strongest iterations of this technology are produced by technology giants with large, cloud infrastructure (i.e., Google, and Microsoft), bolstering high true positive rates. We review the current status of applications of emotion recognition API in psychological research and find that, despite evidence of spatial, age, and race bias effects, API is improving the accessibility of clinical and educational research. Specifically, emotion detection software can assist individuals with emotion-related deficits (e.g., Autism Spectrum Disorder, Attention Deficit-Hyperactivity Disorder, Alexithymia). API has been incorporated in various computer-assisted interventions for Autism, where it has been used to diagnose, train, and monitor emotional responses to one's environment. We identify AP's potential to enhance interventions in other emotional dysfunction populations and to address various professional needs. Future work should aim to address the bias limitations of API software and expand its utility in subfields of clinical, educational, neurocognitive, and industrial-organizational psychology.</p>
Literature ReviewAPIEmotion RecognitionMachine LearningASDADHDAlexithymiaAustin WymanZhiyong Zhang
Copyright (c) 2022 Journal of Behavioral Data Science
2023-07-132023-07-13596910.35566/jbds/v3n1/wymanOn Some Known Derivations and New Ones for The Wishart Distribution: A Didactic
https://jbds.isdsa.org/jbds/article/view/56
<p>The proofs of the probability density function (pdf) of the Wishart distribution tend to be complicated with geometric viewpoints, tedious Jacobians and not self-contained algebra. In this paper, some known proofs and simple new ones for uncorrelated and correlated cases are provided with didactic explanations. For the new derivation of the uncorrelated case, an elementary direct derivation of the distribution of the Bartlett-decomposed matrix is provided. In the derivation of the correlated case from the uncorrelated one, simple methods including a new one are shown.</p>
Theory and MethodsJacobianMultivariate normalityProbability density function (pdf)Triangular matrixBartlett decompositionHaruhiko Ogasawara
Copyright (c) 2022 Journal of Behavioral Data Science
2023-06-212023-06-21345810.35566/jbds/v3n1/ogasawaraUsing Bayesian Piecewise Growth Curve Models to Handle Complex Nonlinear Trajectories
https://jbds.isdsa.org/jbds/article/view/51
<p>Bayesian growth curve modeling is a popular method for studying longitudinal data. In this study, we discuss a flexible extension, the Bayesian piecewise growth curve model (BPGCM), which allows the researcher to break up a trajectory into phases joined at change points called <em>knots</em>. By fitting BPGCMs, the researcher can specify three or more phases of growth without concern for model identification. Our goal is to provide substantive researchers with a guide for implementing this important class of models. We present a simple application of Bayesian linear BPGCMs to childrens' math achievement. Our tutorial includes M<em>plus</em> code, strategies for specifying knots, and how to interpret model selection and fit indices. Extensions of the model are discussed.</p>
Theory and MethodsPiecewise Growth Curve ModelsBayesian SEMModel SelectionLuca MarvinHaiyan LiuSarah Depaoli
Copyright (c) 2022 Journal of Behavioral Data Science
2023-07-132023-07-1313310.35566/jbds/v3n1/marvinPredicting Dyslexia with Machine Learning: A Comprehensive Review of Feature Selection, Algorithms, and Evaluation Metrics
https://jbds.isdsa.org/jbds/article/view/53
<p>This literature review explores the use of machine learning-based approaches for the diagnosis and treatment of dyslexia, a learning disorder that affects reading and spelling skills. Various machine learning models, such as artificial neural networks (ANNs), support vector machines (SVMs), and decision trees, have been used to classify individuals as either dyslexic or non-dyslexic based on functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) data. These models have shown promising results for early detection and personalized treatment plans. However, further research is needed to validate these approaches and identify optimal features and models for dyslexia diagnosis and treatment.</p>
Literature ReviewSVMEEGDyslexiaVelmurugan S
Copyright (c) 2022 Journal of Behavioral Data Science
2023-07-282023-07-28708310.35566/jbds/v3n1/sBayesian IRT in JAGS: A Tutorial
https://jbds.isdsa.org/jbds/article/view/54
<p>Item response modeling is common throughout psychology and education in assessments of intelligence, psychopathology, and ability. The current paper provides a tutorial on estimating the two-parameter logistic and graded response models in a Bayesian framework as well as provide an introduction on evaluating convergence and model fit in this framework. Example data are drawn from depression items in the 2017 Wave of the National Longitudinal Survey of Youth and example code is provided for JAGS and implemented through R using the runjags package. The aim of this paper is to provide readers with the necessary information to conduct Bayesian IRT in JAGS.</p>
TutorialsLogistic Response Model Item Response TheoryBayesian MethodJAGS TutorialKenneth McClure
Copyright (c) 2022 Journal of Behavioral Data Science
2023-03-272023-03-278410710.35566/jbds/v3n1/mccureA Tutorial on Bayesian Analysis of Count Data Using JAGS
https://jbds.isdsa.org/jbds/article/view/49
<p>In behavioral studies, the frequency of a particular behavior or event is often collected and the acquired data are referred to as count data. This tutorial introduces readers to Poisson regression models which is a more appropriate approach for such data. Meanwhile, count data with excessive zeros often occur in behavioral studies and models such as zero-inflated or hurdle models can be employed for handling zero-inflation in the count data. In this tutorial, we aim to cover the necessary fundamentals for these methods and equip readers with application tools of JAGS. Examples of the implementation of the models in JAGS from within R are provided for demonstration purposes.</p>
TutorialsCount dataZero-inflationPoisson regressionZIP modelHurdle modelSijing Shao
Copyright (c) 2022 Journal of Behavioral Data Science
2022-12-142022-12-1415617310.35566/jbds/v2n2/shaoHandling Ignorable and Non-ignorable Missing Data through Bayesian Methods in JAGS
https://jbds.isdsa.org/jbds/article/view/48
<div class="page" title="Page 1"> <div class="layoutArea"> <div class="column"> <p>With the prevalence of missing data in social science research, it is necessary to use methods for handling missing data. One framework in which data with missing values can still be used for parameter estimation is the Bayesian framework. In this tutorial, different missing data mechanisms including Missing Completely at Random, Missing at Random, and Missing Not at Random are introduced. Methods for estimating models with missing values under the Bayesian framework for both ignorable and non-ignorable missingness are also discussed. A structural equation model on data from the Advanced Cognitive Training for Independent and Vital Elderly study is used as an illustration on how to fit missing data models in JAGS.</p> </div> </div> </div>
TutorialsMissing DataBayesian AnalysisStructural Equation ModelingZiqian Xu
Copyright (c) 2022 Journal of Behavioral Data Science
2022-12-132022-12-139912610.35566/jbds/v2n2/xuA Tutorial on Bayesian Latent Class Analysis Using JAGS
https://jbds.isdsa.org/jbds/article/view/47
<p><span class="fontstyle0">This tutorial introduces readers to latent class analysis (LCA) as a model-based approach to understand the unobserved heterogeneity in a population. Given the growing popularity of LCA, we aim to equip readers with theoretical fundamentals as well as computational tools. We outline some potential pitfalls of LCA and suggest related solutions. Moreover, we demonstrate how to conduct frequentist and Bayesian LCA in R with real and simulated data. To ease learning, the analysis is broken down into a series of simple steps. Beyond the simple LCA, two extensions including mixed-model LCA and growth curve LCA are provided to aid readers’ transition to more advanced models. The complete R code and data set are provided.</span> </p> <p> </p>
TutorialsLatent class analysisMixture modelsBayesian analysisMeng Qiu
Copyright (c) 2022 Journal of Behavioral Data Science
2022-12-042022-12-0412715510.35566/jbds/v2n2/qiuThe Performances of Gelman-Rubin and Geweke's Convergence Diagnostics of Monte Carlo Markov Chains in Bayesian Analysis
https://jbds.isdsa.org/jbds/article/view/45
<div id="magicparlabel-353739" class="abstract"> <p>Bayesian statistics have been widely used given the development of Markov chain Monte Carlo sampling techniques and the growth of computational power. A major challenge of Bayesian methods that has not yet been fully addressed is how we can appropriately evaluate the convergence of the random samples to the target posterior distributions. In this paper, we focus on Gelman and Rubin's diagnostic (PSRF), Brooks and Gleman's diagnostic (MPSRF), and Geweke's diagnostics, and compare the Type I error rate and Type II error rate of seven convergence criteria: MPSRF>1.1, any upper bound of PSRF is larger than 1.1, more than 5% of the upper bounds of PSRFs are larger than 1.1, any PSRF is larger than 1.1, more than 5% of PSRFs are larger than 1.1, any Geweke test statistic is larger than 1.96 or smaller than -1.96, and more than 5% of Geweke test statistics are larger than 1.96 or smaller than -1.96. Based on the simulation results, we recommend the upper bound of PSRF if we only can choose one diagnostic. When the number of estimated parameters is large, between the diagnostic per parameter (i.e., PSRF) or the multivariate diagnostic (i.e., MPSRF), we recommend the upper bound of PSRF over MPSRF. Additionally, we do not suggest claiming convergence at the analysis level while allowing a small proportion of the parameters to have significant convergence diagnosis results.</p> </div>
Theory and MethodsConvergence diagnostics Bayesian analysisGelman-Rubin diagnosticGeweke diagnosticHan DuZijun KeGe JiangSijia Huang
Copyright (c) 2022 Journal of Behavioral Data Science
2022-11-142022-11-14477210.35566/jbds/v2n2/p3Relative Predictive Performance of Treatments of Ordinal Outcome Variables across Machine Learning Algorithms and Class Distributions
https://jbds.isdsa.org/jbds/article/view/43
<p>Abstract Ordinal variables, such as those measured on a five-point Likert scale, are ubiquitous in the behavioral sciences. However, machine learning methods for modeling ordinal outcome variables (i.e., ordinal classification) are not as well-developed or widely utilized, compared to classification and regression methods for modeling nominal and continuous outcomes, respectively. Consequently, ordinal outcomes are often treated “naively” as nominal or continuous outcomes in practice. This study builds upon previous literature that has examined the predictive performance of such naïve approaches of treating ordinal outcome variables compared to ordinal classification methods in machine learning. We conducted a Monte Carlo simulation study to systematically assess the relative predictive performance of an ordinal classification approach proposed by Frank and Hall (2001) against naïve approaches according to two key factors that have received limited attention in previous literature: (1) the machine learning algorithm being used to implement the approaches and (2) the class distribution of the ordinal outcome variable. The consideration of these important, practical factors expands our knowledge on the consequences of naïve treatments of ordinal outcomes, which are shown in this study to vary substantially according to these factors. Given the ubiquity of ordinal measures coupled with the growing presence of machine learning applications in the behavioral sciences, these are important considerations for building high-performing predictive models in the field.</p>
Theory and MethodsOrdinal classificationMachine learningPredictive performanceClass imbalanceMeasurement scaleHonoka SuzukiOscar Gonzalez
Copyright (c) 2022 Journal of Behavioral Data Science
2022-12-162022-12-16739810.35566/jbds/v2n2/suzukiA New Bayesian Structural Equation Modeling Approach with Priors on the Covariance Matrix Parameter
https://jbds.isdsa.org/jbds/article/view/41
<p>Bayesian inference for structural equation models (SEMs) is increasingly popular in social and psychological sciences owing to its flexibility to adapt to more complex models and the ability to include prior information if available. However, there are two major hurdles in using the traditional Bayesian SEM in practice: (1) the information nested in the prior distributions is hard to control, and (2) the MCMC iterative procedures naturally lead to Markov chains with serial dependence and the diagnostics of their convergence are often difficult. In this study, we present an alternative procedure for Bayesian SEM aiming to address the two challenges. In the new Bayesian SEM procedure, we specify a prior distribution on the population covariance matrix parameter Σ and obtain its posterior distribution <span id="MathJax-Span-58" class="mi">p</span><span id="MathJax-Span-59" class="mo">(</span><span id="MathJax-Span-60" class="texatom"><span id="MathJax-Span-61" class="mrow"><span id="MathJax-Span-62" class="mi">Σ</span></span></span><span id="MathJax-Span-63" class="texatom"><span id="MathJax-Span-64" class="mrow"><span id="MathJax-Span-65" class="mo">|</span></span></span><span id="MathJax-Span-66" class="mtext">data</span><span id="MathJax-Span-67" class="mo">)</span>. We then construct a posterior distribution of model parameters <span id="MathJax-Element-12-Frame" class="MathJax" style="display: inline; font-style: normal; font-weight: 400; line-height: normal; font-size: 20px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: 0px; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; color: #000000; font-family: 'Noto Sans', 'Noto Kufi Arabic', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif; font-variant-ligatures: normal; font-variant-caps: normal; orphans: 2; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; position: relative;" tabindex="0" role="presentation" data-mathml="<math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="bold-italic">&#x03B8;</mi></math>"><span id="MathJax-Span-68" class="math"><span id="MathJax-Span-69" class="mrow"><span id="MathJax-Span-70" class="mi">θ</span></span></span></span> in the hypothetical SEM model by transforming the posterior distribution of Σ to a distribution of model parameter <span id="MathJax-Element-12-Frame" class="MathJax" style="display: inline; font-style: normal; font-weight: 400; line-height: normal; font-size: 20px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: 0px; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; color: #000000; font-family: 'Noto Sans', 'Noto Kufi Arabic', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif; font-variant-ligatures: normal; font-variant-caps: normal; orphans: 2; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; position: relative;" tabindex="0" role="presentation" data-mathml="<math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="bold-italic">&#x03B8;</mi></math>"><span id="MathJax-Span-68" class="math"><span id="MathJax-Span-69" class="mrow"><span id="MathJax-Span-70" class="mi">θ</span></span></span></span>. The new procedure eases the practice of Bayesian SEM significantly and has a better control over the information nested in the prior distribution. We evaluated its performance through a simulation study and demonstrate its application through an empirical example.</p>
Theory and MethodsStructural equation modelingBayesian analysisInverse Wishart priorInformative priorConvergence diagnosticsHaiyan LiuWen QuZhiyong ZhangHao Wu
Copyright (c) 2022 Journal of Behavioral Data Science
2022-08-072022-08-07234610.35566/jbds/v2n2/p2The Impact of Sample Size on Exchangeability in the Bayesian Synthesis Approach to Data Fusion
https://jbds.isdsa.org/jbds/article/view/38
<p>Data fusion approaches have been adopted to facilitate more complex analyses and produce more accurate results. Bayesian Synthesis is a relatively new approach to data fusion where results from the analysis of one dataset are used as prior information for the analysis of the next dataset. Datasets of interest are sequentially analyzed until a final posterior distribution is created, incorporating information from all candidate datasets, rather than simply combining the datasets into one large dataset and analyzing them simultaneously. One concern with this approach lies in the sequence of datasets being fused. This study examines whether the order of datasets matters when the datasets being fused each have substantially different sample sizes. The performance of Bayesian Synthesis with varied sample sizes is evaluated by examining results from simulated data with known population values under a variety of conditions. Results suggest that the order in which the dataset are fused can have a significant impact on the obtained estimates.</p>
Theory and MethodsBayesian synthesisData fusionExchangeabilityKaterina MarcoulidesJia QuanEric Wright
Copyright (c) 2022 Journal of Behavioral Data Science
2022-07-262022-07-267510510.35566/jbds/v2n1/p5Disentangling the Influence of Data Contamination in Growth Curve Modeling: A Median Based Bayesian Approach
https://jbds.isdsa.org/jbds/article/view/40
<p>Growth curve models (GCMs), with their ability to directly investigate within-subject change over time and between-subject differences in change for longitudinal data, are widely used in social and behavioral sciences. While GCMs are typically studied with the normal distribution assumption, empirical data often violate the normality assumption in applications. Failure to account for the deviation from normality in data distribution may lead to unreliable model estimation and misleading statistical inferences. A robust GCM based on conditional medians was recently proposed and outperformed traditional growth curve modeling when outliers are present resulting in nonnormality. However, this robust approach was shown to perform less satisfactorily when leverage observations existed. In this work, we propose a robust double medians growth curve modeling approach (DOME GCM) to thoroughly disentangle the influence of data contamination on model estimation and inferences, where two conditional medians are employed for the distributions of the within-subject measurement errors and of random effects, respectively. Model estimation and inferences are conducted in the Bayesian framework, and Laplace distributions are used to convert the optimization problem of median estimation into a problem of obtaining the maximum likelihood estimator for a transformed model. A Monte Carlo simulation study has been conducted to evaluate the numerical performance of the proposed approach, and showed that the proposed approach yields more accurate and efficient parameter estimates when data contain outliers or leverage observations. The application of the developed robust approach is illustrated using a real dataset from the Virginia Cognitive Aging Project to study the change of memory ability.</p>
Theory and MethodsRobust methodsgrowth curve modelingconditional mediansLaplace distributionTonghao ZhangXin TongJianhui Zhou
Copyright (c) 2022 Journal of Behavioral Data Science
2022-07-272022-07-2712210.35566/jbds/v2n2/p1How to Select the Best Fit Model among Bayesian Latent Growth Models for Complex Data
https://jbds.isdsa.org/jbds/article/view/39
<p>Bayesian approach is becoming increasingly important as it provides many advantages in dealing with complex data. However, there is no well-defined model selection criterion or index in a Bayesian context. To address the challenges, new indices are needed. The goal of this study is to propose new model selection indices and to investigate their performances in the framework of latent growth mixture models with missing data and outliers in a Bayesian context. We consider latent growth models because they are very flexible in modeling complex data and becoming increasingly popular in statistical, psychological, behavioral, and educational areas. Specifically, this study conducted five simulation studies to cover different cases, including latent growth curve models with missing data, latent growth curve models with missing data and outliers, growth mixture models with missing data and outliers, extended growth mixture models with missing data and outliers, and latent growth models with different classes. Simulation results show that almost all proposed indices can effectively identify the true model. This study also illustrated the application of these model selection indices in real data analysis.</p>
Theory and MethodsModel Selection CriterionBayesian EstimationLatent Growth ModelsMissing DataRobust MethodLaura LuZhiyong Zhang
Copyright (c) 2022 Journal of Behavioral Data Science
2022-06-232022-06-23355810.35566/jbds/v2n1/p2Does Minority Case Sampling Improve Performance with Imbalanced Outcomes in Psychological Research?
https://jbds.isdsa.org/jbds/article/view/37
<p>In psychological research, class imbalance in binary outcome variables is a common occurrence, particularly in clinical variables (e.g., suicide outcomes). Class imbalance can present a number of difficulties for inference and prediction, prompting the development of a number of strategies that perform data augmentation through random sampling from just the positive cases, or from both the positive and negative cases. Through evaluation in benchmark datasets from computer science, these methods have shown marked improvements in predictive performance when the outcome is imbalanced. However, questions remain regarding generalizability to psychological data. To study this, we implemented a simulation study that tests a number of popular sampling strategies implemented in easy-to-use software, as well as in an empirical example focusing on the prediction of suicidal thoughts. In general, we found that while one sampling strategy demonstrated far worse performance even in comparison to no sampling, the other sampling methods performed similarly, evidencing slight improvements over no sampling. Further, we evaluated the sampling strategies across different forms of cross-validation, model fit metrics, and machine learning algorithms.</p>
Theory and MethodsImbalanced dataSampling strategiesMachine learningRoss JacobucciXiaobei Li
Copyright (c) 2022 Journal of Behavioral Data Science
2022-06-152022-06-15597410.35566/jbds/v2n1/p3Book Review: An Introduction to Nonparametric Statistics
https://jbds.isdsa.org/jbds/article/view/34
<pre>This is a brief comparative review of the book An Introduction to Nonparametric Statistics.</pre>
Book and Software ReivewsBook reviewNonparametric StatisticsR softwareKévin Allan Sales Rodrigues
Copyright (c) 2022 Journal of Behavioral Data Science
2022-07-052022-07-0512412710.35566/jbds/v2n1/p8The Role of Personality in Trust in Public Policy Automation
https://jbds.isdsa.org/jbds/article/view/33
<p>Algorithms play an increasingly important role in public policy decision-making. Despite this consequential role, little effort has been made to evaluate the extent to which people trust algorithms in decision-making, much less the personality characteristics associated with higher levels of trust. Such evaluations inform the widespread adoption and efficacy of algorithms in public policy decision-making. We explore the role of major personality inventories -- need for cognition, need to evaluate, the "Big 5" -- in shaping an individual's trust in public policy algorithms, specifically dealing with criminal justice sentencing. Through an original survey experiment, we find strong correlations between all personality types and general levels of trust in automation, as expected. Further, we uncovered evidence that need for cognition increases the weight given to advice from an algorithm relative to humans, and "agreeableness" decreases the distance between respondents' expectations and advice from a judge, relative to advice from a crowd.</p>
Application and Case StudiesPersonality Trust in automationPublic policyDecision-makingPhilip WaggonerRyan Kennedy
Copyright (c) 2022 Journal of Behavioral Data Science
2022-05-112022-05-1110612310.35566/jbds/v2n1/p4/The Lighting of the BECONs
https://jbds.isdsa.org/jbds/article/view/26
<div class="page" title="Page 1"> <div class="layoutArea"> <div class="column"> <p>The imposition of lockdowns in response to the COVID-19 outbreak has underscored the importance of human behavior in mitigating virus transmission. The scientific study of interventions designed to change behavior (e.g., to promote physical distancing) requires measures of effectiveness that are fast, that can be assessed through experiments, and that can be investigated without actual virus transmission. This paper presents a methodological approach designed to deliver such indicators. We show how behavioral data, obtainable through wearable assessment devices or camera footage, can be used to assess the effect of interventions in experimental research; in addition, the approach can be extended to longitudinal data involving contact tracing apps. Our methodology operates by constructing a contact network: a representation that encodes which individuals have been in physical proximity long enough to transmit the virus. Because behavioral interventions alter the contact network, a comparison of contact networks before and after the intervention can provide information on the effectiveness of the intervention. We coin indicators based on this idea Behavioral Contact Network (BECON) indicators. We examine the performance of three indicators: the Density BECON, based on differences in network density; the Spectral BECON, based on differences in the eigenvector of the adjacency matrix; and the ASPL BECON, based on differences in average shortest path lengths. Using simulations, we show that all three indicators can effectively track the effect of behavioral interventions. Even in conditions with significant amounts of noise, BECON indicators can reliably identify and order effect sizes of interventions. The present paper invites further study of the method as well as practical implementations to test the validity of BECON indicators in real data.</p> </div> </div> </div>
Theory and MethodsCovid-19contact networkbehavioral data sciencephysical distancingsocial distancingintervention effectsnetwork analysisDenny BorsboomTessa BlankenFabian DablanderFrenk van HarreveldCharlotte TanisPiet Van Mieghem
Copyright (c) 2022 Journal of Behavioral Data Science
2022-07-042022-07-0413410.35566/jbds/v2n1/p1A Weighted Residual Bootstrap Method for Multilevel Modeling with Sampling Weights
https://jbds.isdsa.org/jbds/article/view/30
<p>Multilevel modeling is often used to analyze survey data collected with a multistage sampling design. When the selection is informative, sampling weights need to be incorporated in the estimation. We propose a weighted residual bootstrap method as an alternative to the multilevel pseudo-maximum likelihood (MPML) estimators. In a Monte Carlo simulation using two-level linear mixed effects models, the bootstrap method showed advantages over MPML for the estimates and the statistical inferences of the intercept, the slope of the level-2 predictor, and the variance components at level-2. The impact of sample size, selection mechanism, intraclass correlation (ICC), and distributional assumptions on the performance of the methods were examined. The performance of MPML was suboptimal when sample size and ICC were small and when the normality assumption was violated. The bootstrap estimates performed generally well across all the simulation conditions, but had notably suboptimal performance in estimating the covariance component in a random slopes model when sample size and ICCs were large. As an illustration, the bootstrap method is applied to the American data of the OECD’s Program for International Students Assessment (PISA) survey on math achievement using the R package <em>bootmlm</em>.</p>
Theory and MethodsBootstrapInformative selectionMultilevel modelingSampling weightsPseudo-maximum likelihoodWen LuoHok Chio Lai
Copyright (c) 2021 Journal of Behavioral Data Science
2021-12-022021-12-028911810.35566/jbds/v1n2/p6Structural Equation Modeling using Stata
https://jbds.isdsa.org/jbds/article/view/29
<p>In this tutorial, you will learn how to fit structural equation models (SEM) using Stata software. SEMs can be fit in Stata using the <strong>sem</strong> command for standard linear SEMs, the <strong>gsem</strong> command for generalized linear SEMs, or by drawing their path diagrams in the SEM Builder. After a brief introduction to Stata, the <strong>sem</strong> command will be demonstrated through a confirmatory factor analysis model, mediation model, group analysis, and a growth curve model, and the <strong>gsem</strong> command will be demonstrated through a random-slope model and a logistic ordinal regression. Materials and datasets are provided online, allowing anyone with Stata to follow along.</p>
TutorialsStructural Equation ModelingGrowth Curve ModelingMediationSoftwareStataMeghan Cain
Copyright (c) 2021 Journal of Behavioral Data Science
2021-12-022021-12-0215617710.35566/jbds/v1n2/p7GPS2space: An Open-source Python Library for Spatial Measure Extraction from GPS Data
https://jbds.isdsa.org/jbds/article/view/27
<p>Global Positioning System (GPS) data have become one of the routine data streams collected by wearable devices, cell phones, and social media platforms in this digital age. Such data provide research opportunities in that they may provide contextual information to elucidate where, when, and why individuals engage in and sustain particular behavioral patterns. However, raw GPS data consisting of densely sampled time series of latitude and longitude coordinate pairs do not readily convey meaningful information concerning intra-individual dynamics and inter-individual differences; substantial data processing is required. Raw GPS data need to be integrated into a Geographic Information System (GIS) and analyzed, from which the mobility and activity patterns of individuals can be derived, a process that is unfamiliar to many behavioral scientists. In this tutorial article, we introduced GPS2space, a free and open-source Python library that we developed to facilitate the processing of GPS data, integration with GIS to derive distances from landmarks of interest, as well as extraction of two spatial features: activity space of individuals and shared space between individuals, such as members of the same family. We demonstrated functions available in the library using data from the Colorado Online Twin Study to explore seasonal and age-related changes in individuals’ activity space and twin siblings’ shared space, as well as gender, zygosity and baseline age-related differences in their initial levels and/or changes over time. We concluded with discussions of other potential usages, caveats, and future developments of GPS2space.</p>
SoftwareSpatial MeasureTwinsBehavior GeneticsLatent Growth Curve ModelPythonShuai ZhouYanling LiGuangqing ChiJunjun YinZita OraveczYosef BodovskiNaomi P. FriedmanScott I. VriezeSy-Miin Chow
Copyright (c) 2021 Journal of Behavioral Data Science
2021-11-082021-11-0812715510.35566/jbds/v1n2/p5Two-step growth mixture model to examine heterogeneity in nonlinear trajectories
https://jbds.isdsa.org/jbds/article/view/21
<p>Empirical researchers are usually interested in investigating the impacts that baseline covariates have when uncovering sample heterogeneity and separating samples into more homogeneous groups. However, a considerable number of studies in the structural equation modeling (SEM) framework usually start with vague hypotheses in terms of heterogeneity and possible causes. It suggests that (1) the determination and specification of a proper model with covariates is not straightforward, and (2) the exploration process may be computationally intensive given that a model in the SEM framework is usually complicated and the pool of candidate covariates is usually huge in the psychological and educational domain where the SEM framework is widely employed. Following Bakk and Kuha (2017), this article presents a two-step growth mixture model (GMM) that examines the relationship between latent classes of nonlinear trajectories and baseline characteristics. Our simulation studies demonstrate that the proposed model is capable of clustering the nonlinear change patterns, and estimating the parameters of interest unbiasedly, precisely, as well as exhibiting appropriate confidence interval coverage. Considering the pool of candidate covariates is usually huge and highly correlated, this study also proposes implementing exploratory factor analysis (EFA) to reduce the dimension of covariate space. We illustrate how to use the hybrid method, the two-step GMM and EFA, to efficiently explore the heterogeneity of nonlinear trajectories of longitudinal mathematics achievement data.</p>
Theory and MethodsGrowth Mixture ModelsNonlinear TrajectoriesIndividual Measurement OccasionsCovariatesSimulation StudiesExploratory Factor AnalysisJin LiuLe KangRoy T. SaboRobert M. KirkpatrickRobert A. Perera
Copyright (c) 2021 Journal of Behavioral Data Science
2021-08-282021-08-28548810.35566/jbds/v1n2/p4Tree-based Matching on Structural Equation Model Parameters
https://jbds.isdsa.org/jbds/article/view/20
<p>Understanding causal effects of a treatment is often of interest in the social sciences. When treatments cannot be randomly assigned, researchers must ensure that treated and untreated participants are balanced on covariates before estimating treatment effects. Conventional practices are useful in matching such that treated and untreated participants have similar average values on their covariates. However, situations arise in which a researcher may instead want to match on model parameters. We propose an algorithm, Causal M<em>plus</em> Trees, which uses decision trees to match on structural equation model parameters and estimates conditional average treatment effects in each node. We provide a proof of concept using two small simulation studies and demonstrate its application using COVID-19 data.</p>
Theory and Methodsmatchingstructural equation modelingdecision treesmachine learningSarfaraz SerangJames Sears
Copyright (c) 2021
2021-08-282021-08-28315310.35566/jbds/v1n2/p3A Note on Wishart and Inverse Wishart Priors for Covariance Matrix
https://jbds.isdsa.org/jbds/article/view/19
<p>For inference involving a covariance matrix, inverse Wishart priors are often used in Bayesian analysis. To help researchers better understand the influence of inverse Wishart priors, we provide a concrete example based on the analysis of a two by two covariance matrix. Recommendations are provided on how to specify an inverse Wishart prior.</p>
Theory and MethodsWishart distributioninverse Wishart distributionprior distributioncovariance matrixZhiyong Zhang
Copyright (c) 2021
2021-08-282021-08-2811912610.35566/jbds/v1n2/p2Bayesian Approach to Non-ignorable Missingness in Latent Growth Models
https://jbds.isdsa.org/jbds/article/view/18
<p>Latent growth curve models (LGCMs) are becoming increasingly important among growth models because they can effectively capture individuals' latent growth trajectories and also explain the factors that influence such growth by analyzing the repeatedly measured manifest variables. However, with the increase in complexity of LGCMs, there is an increase in issues on model estimation. This research proposes a Bayesian approach to LGCMs to address the perennial problem of almost all longitudinal research, namely, missing data. First, different missingness models are formulated. We focus on non-ignorable missingness in this article. Specifically, these models include the latent intercept dependent missingness, the latent slope dependent missingness, and the potential outcome dependent missingness. To implement the model estimation, this study proposes a full Bayesian approach through data augmentation algorithm and Gibbs sampling procedure. Simulation studies are conducted and results show that the proposed method accurately recover model parameters and the mis-specified missingness may result in severely misleading conclusions. Finally, the implications of the approach and future research directions are discussed.</p>
Theory and MethodsBayesian EstimationMissing DataLatent Growth Curve ModelsNon-ignorable MissingnessLongitudinal AnalysisMultilevel ModelingZhenqiu (Laura) LuZhiyong Zhang
Copyright (c) 2021
2021-08-282021-08-2813010.35566/jbds/v1n2/p1