Journal of Behavioral Data Science
https://jbds.isdsa.org/jbds
<p><strong>ISSN: 2575-8306 (Print)</strong><br /><strong>ISSN: 2574-1284 (Online)</strong><br /><strong>DOI: <a class="urlextern" title="https://dx.doi.org/10.35566/jbds" href="https://dx.doi.org/10.35566/jbds" rel="nofollow">10.35566/jbds</a></strong></p> <p>The Journal of Behavioral Data Science is a peer-reviewed, open-access journal that aims to provide a free-of-charge-to-publish platform for researchers and practitioners in the area of data science and data analytics. The journal is committed to making high-quality research freely accessible to authors and readers. Publishing in the journal and accessing its content are completely free for both authors and readers. This allows for the widest possible dissemination of research, and promotes interdisciplinary collaboration and innovation in the field of behavioral data science.</p>International Society for Data Science and AnalyticsenJournal of Behavioral Data Science2574-1284Modeling Data with Measurement Errors but without Predefined Metrics: Fact versus Fallacy
https://jbds.isdsa.org/jbds/article/view/96
<p>Data in social and behavioral sciences typically contain measurement errors and also do not have predefined metrics. Structural equation modeling (SEM) is commonly used to analyze such data. This article discuss issues in latent-variable modeling as compared to regression analysis with composite-scores. Via logical reasoning and analytical results as well as the analyses of two real datasets, several misconceptions related to bias and accuracy of parameter estimates, standardization of variables, and result interpretation are clarified. The results are expected to facilitate better understanding of the strength and limitations of SEM and regression analysis with weighted composites, and to advance social and behavioral data science.</p>
Theory and MethodsMeasurement errorAttenuationStandardizationScales of latent variablesKe-Hai YuanZhiyong Zhang
Copyright (c) 2024 Journal of Behavioral Data Science
2024-08-182024-08-1812810.35566/jbds/yuangreekLetters: Routines for Writing Greek Letters and Mathematical Symbols on the RStudio and RGui
https://jbds.isdsa.org/jbds/article/view/92
<p>This is a brief description of the R package <em>greekLetters</em>. In short, <em>greekLetters</em> is a package for displaying Greek letters and various mathematical symbols in RStudio and RGui environments.</p>
SoftwareRStatistical notationMathematical symbolsGreek lettersKévin Allan Sales Rodrigues
Copyright (c) 2024 Journal of Behavioral Data Science
2024-08-182024-08-181610.35566/jbds/rodriguesRephrasing the Lengthy and Involved Proof of Kristof’s Theorem: A Tutorial with Some New Findings
https://jbds.isdsa.org/jbds/article/view/76
<p>Kristof’s theorem gives the global maximum and minimum of the trace of some matrix products without using calculus or Lagrange multipliers with various applications in psychometrics and multivariate analysis. However, the underutilization has been seen irrespective of its great use in practice. This may partially be due to the lengthy and involved proof of the theorem. In this tutorial, some known or new lemmas are rephrased or provided to understand the essential points in the proof. ten Berge’s generalized Kristof theorem is also addressed. Then, the modified Kristof and ten Berge theorems using parent orthonormal matrices are shown, which may be of use to see the properties of the Kristof and ten Berge theorems.</p> <p> </p>
Theory and Methodsvon Neumann’s trace inequalityGeneralized Kristof theoremSuborthonormalSemiorthonormalSingular value decompositionHaruhiko Ogasawara
Copyright (c) 2024 Journal of Behavioral Data Science
2024-07-272024-07-2712210.35566/jbds/ogasawara2Loss Aversion Distribution: The Science Behind Loss Aversion Exhibited by Sellers of Perishable Good
https://jbds.isdsa.org/jbds/article/view/74
<p>This research introduces the concept of the loss aversion distribution, a pioneering framework designed for the analysis of consumer behavior. Departing from the conventions of traditional exponential models, this innovative approach incorporates a non-memoryless characteristic, which modulates the consumer's response to loss aversion throughout the product's life cycle. This modulation is achieved by a variable exponent influenced by the parameter $b$, representing the psychological impact of loss aversion, and the constant $k$, which reflects the market value of the good at the time of manufacture. Together, these parameters adeptly encapsulate the dynamic nature of consumer loss aversion from the moment of manufacture to the point of expiry. The model elucidates an initial muted response from consumers at the onset of ownership, which then intensifies during the mid-life cycle of the product, before ultimately diminishing as the product approaches its expiry. Through a meticulous derivative analysis of the probability density function, the study delineates the distribution's key properties, including its monotonicity, boundedness within the interval [0, 1], and its adherence to non-negativity. This framework not only enhances our comprehension of consumer behavior in relation to perishable goods but also paves the way for further investigations into psychometrics and the intricacies of loss aversion modeling.</p>
Theory and MethodsLoss aversionProspect theoryProbability theoryDaniel Koh
Copyright (c) 2024 Journal of Behavioral Data Science
2024-03-242024-03-24638010.35566/jbds/kohA Tutorial on Bayesian Linear Regression with Compositional Predictors Using JAGS
https://jbds.isdsa.org/jbds/article/view/72
<div id="magicparlabel-2063043" class="abstract">This tutorial offers an exploration of advanced Bayesian methodologies for compositional data analysis, specifically the Bayesian Lasso and Bayesian Spike-and-Slab Lasso (SSL) techniques. Our focus is on a novel Bayesian methodology that integrates Lasso and SSL priors, enhancing both parameter estimation and variable selection for linear regression with compositional predictors. The tutorial is structured to streamline the learning process, breaking down complex analyses into a series of straightforward steps. We demonstrate these methods using R and JAGS, employing simulated datasets to illustrate key concepts. Our objective is to provide a clear and comprehensive understanding of these sophisticated Bayesian techniques, preparing readers to adeptly navigate and apply these methods in their own compositional data analysis endeavors.</div>
TutorialsBayesian analysisCompositional dataLassoSpike and slab lassoYunli LiuXin Tong
Copyright (c) 2024 Journal of Behavioral Data Science
2024-01-282024-01-288110410.35566/jbds/tongliuStability and Spread: Transition Metrics that are Robust to Time Interval Misspecification
https://jbds.isdsa.org/jbds/article/view/70
<p>Intensive longitudinal data collected via ecological momentary assessment (EMA) are often sampled with unequal time spacing between surveys. Given the popularity of EMA data, it is important to understand whether time series methods are robust to such time interval misspecification. The present study demonstrates via simulation that stability and spread—two metrics for quantifying different aspects of transitioning behavior within multivariate binary time series data—are unbiased when applied to data that are collected along an off/on burst sampling schedule, a between-person random sampling schedule, and a within-person random sampling schedule. These results held in randomly generated data with differing numbers of time series variables (k=10 and k=20) and in data simulated based on the proportions of observed data from a prior EMA study. Further, stability and spread demonstrated approximately 95% coverage for all between- and within-person random sampling schedules. However, coverage for stability and spread was poor in the off/on burst sampling schedules (around 67%). We also applied these transition metrics—which measure repetitiveness and diversity of transitions, respectively—to a foundational EMA dataset that was among the first to show that adults regularly use many different emotion regulation strategies throughout their daily life (Heiy & Cheavens, 2014). As hypothesized, we found a stronger positive relation between mood and higher stability/lower spread in emotion regulation among people with fewer depressive symptoms than those with more depressive symptoms. Taken together, stability and spread appear to be appropriate metrics to use with data collected using common unequal time spacing conditions and can be used to uncover theoretically consistent insights in real psychosocial data.</p>
Theory and MethodsTime seriesEcological momentary assessmentTransitionsBinary dataEmotion regulationSwitchingKatharine DanielRobert MoulderMatthew SouthwardJennifer CheavensSteven Boker
Copyright (c) 2024 Journal of Behavioral Data Science
2024-06-112024-06-11194410.35566/jbds/danielConducting Meta-analyses of Proportions in R
https://jbds.isdsa.org/jbds/article/view/60
<p>Meta-analysis of proportions has been widely adopted across various scientific disciplines as a means to estimate the prevalence of phenomena of interest. However, there is a lack of comprehensive tutorials demonstrating the proper execution of such analyses using the R programming language. The objective of this study is to bridge this gap and provide an extensive guide to conducting a meta-analysis of proportions using R. Furthermore, we offer a thorough critical review of the methods and tests involved in conducting a meta-analysis of proportions, highlighting several common practices that may yield biased estimations and misleading inferences. We illustrate the meta-analytic process in five stages: (1) preparation of the R environment; (2) computation of effect sizes; (3) quantification of heterogeneity; (4) visualization of heterogeneity with the forest plot and the Baujat plot; and (5) explanation of heterogeneity with moderator analyses. In the last section of the tutorial, we address the misconception of assessing publication bias in the context of meta-analysis of proportions. The provided code offers readers three options to transform proportional data (e.g., the double arcsine method). The tutorial presentation is conceptually oriented and formula usage is minimal. We will use a published meta-analysis of proportions as an example to illustrate the implementation of the R code and the interpretation of the results.</p>
TutorialsMeta-analysis of proportionsHeterogeneityMeta-regressionDouble arcsine transformationBaujat plotNaike Wang
Copyright (c) 2023 Journal of Behavioral Data Science
2023-11-072023-11-076412610.35566/jbds/v3n2/wangRobust Bayesian growth curve modeling: A tutorial using JAGS
https://jbds.isdsa.org/jbds/article/view/67
<p>Latent growth curve models (LGCM) are widely used in longitudinal data analysis, and robust methods can be used to model error distributions for non-normal data. This tutorial introduces how to model<br />linear, non-linear, and quadratic growth curve models under the Bayesian framework and uses examples to illustrate how to model errors using t, exponential power, and skew-normal distributions. The code of JAGS models is provided and implemented by the R package runjags. Model diagnostics and comparisons are briefly discussed.</p>
TutorialsRobust Growth Curve ModelingBayesian EstimationStructural Equation ModelingJAGSRuoxuan Li
Copyright (c) 2023 Journal of Behavioral Data Science
2023-09-242023-09-24436310.35566/jbds/v3n2/liA Novel Approach for Identifying Unobserved Heterogeneity in Longitudinal Growth Trajectories Using Natural Cubic Smoothing Splines
https://jbds.isdsa.org/jbds/article/view/65
<p>A novel algorithmic modeling method is proposed to determine dissimilarities between subjects for longitudinal data clustering using natural cubic smoothing splines. Although various modeling techniques have to date been suggested for conducting such analyses, a major problem with many of these approaches is that they often impose overly restrictive assumptions. As a consequence, potentially problematic interpretations of data clustering regarding both the number and the nature of the growth trajectory patterns can occur. The proposed method is shown to be highly effective in identifying heterogeneity of growth trajectories in settings with data exhibiting complex nonlinear longitudinal patterns and without imposing potentially problematic constraints on the model.</p>
Theory and MethodsUnobserved heterogeneityLatent class detectionNatural cubic smoothing splinesKaterina M. MarcoulidesLaura Trinchera
Copyright (c) 2024 Journal of Behavioral Data Science
2024-05-122024-05-1211810.35566/jbds/marcoulidesLasso and Group Lasso with Categorical Predictors: Impact of Coding Strategy on Variable Selection and Prediction
https://jbds.isdsa.org/jbds/article/view/64
<p>Machine learning methods are being increasingly adopted in behavioral research. Lasso regression performs variable selection and regularization, and is particularly appealing to behavioral researchers because of its connection to linear regression. Researchers may expect properties of linear regression to translate to lasso, but we demonstrate that this assumption is problematic for models with categorical predictors. Specifically, we demonstrate that while the coding strategy used for categorical predictors does not impact the performance of linear regression, it does impact lasso’s performance. Group lasso is an alternative to lasso for models with categorical predictors. We investigate the discrepancy between lasso and group lasso models using a real data set: lasso performs different variable selection and has different prediction accuracy depending on the coding strategy, while group lasso performs consistent variable selection but has different prediction accuracy. Using a Monte Carlo simulation, we demonstrate a specific case where group lasso tends to include many variables when few are needed, leading to overfitting. We conclude with recommended solutions to this issue and future directions of exploration to improve the implementation of machine learning approaches in behavioral science. This project shows that when using lasso and group lasso with categorical predictors, the choice of coding strategy should not be ignored.</p>
Theory and MethodsLasso regressionCategorical predictorsRegularizationYihuan HuangTristan TibbeAmy TangAmanda Montoya
Copyright (c) 2023 Journal of Behavioral Data Science
2024-01-262024-01-26154210.35566/jbds/v3n2/montoyaA Proof-of-Concept Study Demonstrating How FITBIR Datasets Can be Harmonized to Examine Posttraumatic Stress Disorder-Traumatic Brain Injury Associations
https://jbds.isdsa.org/jbds/article/view/63
<p><strong>Background: </strong>Although posttraumatic stress disorder (PTSD) is common following traumatic brain injury (TBI), the specific associations between these conditions is difficult to elucidate in part due to the diverse methodologies, small samples, and limited longitudinal data in the extant literature.</p> <p><strong>Objective: </strong>Conduct a proof-of-concept study demonstrating our ability to compile patient-level TBI data from shared studies in the Federal Interagency Traumatic Brain Injury Research (FITBIR) Informatics System to address these shortcomings and improve our understanding of TBI outcomes including the rates PTSD comorbidity.</p> <p><strong>Method</strong>: We searched the FITBIR database for shared studies reporting rates of probable PTSD among participants with no TBI, history of mild TBI, or history of moderate/severe TBI. We merged and harmonized data across the relevant studies and analyzed rates of probable PTSD across TBI history and severity categories.</p> <p><strong>Results: </strong>Four FITBIR studies with 2,312 participants included PTSD outcome data. The final sample for comparative analyses comprised 1,633 participants from two studies with TBI group comparison data. Approximately 79% had a history of mild TBI and 32-37% screened positive for probable PTSD. Participants with a history of mild TBI had 2.8 greater odds of probable PTSD compared to those without TBI (95% CI: 2.0, 3.7).</p> <p><strong>Conclusions: </strong>Only two FITBIR studies reported data examining PTSD outcomes for mild TBI as of January 2021. The analyses are consistent with prior literature, suggesting mild TBI is associated with higher rates of probable PTSD than no TBI. This study developed the methods, shared the harmonization and analysis code, and publicly shared the TBI and PTSD meta-dataset back to FITBIR for dissemination through their website, allowing future research teams to update these and other, related analyses as more studies are contributed to and shared via the FITBIR platform.</p>
Theory and MethodsTraumatic brain injury Posttraumatic stress disorder Evidence synthesisData repositoryMeta-dataMaya O'NeilDavid CameronKate ClaussDanielle KrushnicWilliam Baker-RobinsonSara HannonTamara CheneyJosh KaplanLawrence CookMeike NiederhausenMiranda PappasDavid Cifu
Copyright (c) 2024 Journal of Behavioral Data Science
2024-04-252024-04-25456210.35566/jbds/oneilConsidering the Distributional Form of Zeroes When Calculating Mediation Effects with Zero-Inflated Count Outcomes
https://jbds.isdsa.org/jbds/article/view/58
<p>Recent work has demonstrated how to calculate conditional mediated effects for mediation models with zero-inflated count outcomes in a non-causal framework (O’Rourke & Vazquez, 2019); however, those formulas do not distinguish between logistic and count portions of the data distribution when calculating mediated effects separately for zeroes and counts. When calculating conditional mediated effects for the counts in a zero-inflated count outcome Y, the <em>b</em> path should use the partial derivative of the log-linear regression equation for X and M predicting Y. When calculating conditional mediated effects for the zeroes, the <em>b</em> path should use the partial derivative of the logistic regression equation for X and M predicting Y instead of the log-linear equation. This paper presents adjustments to the analytical formulas of conditional mediated effects for mediation with zero-inflated count outcomes when zeroes and counts are differentially predicted. Using a Monte Carlo simulation, we also empirically show that these adjustments produce different results than when the distributional form of zeroes is ignored.</p>
Theory and MethodsMediation analysisCount outcomesZero-inflationZIPZINBHurdle modelsHolly O'RourkeDa Eun Han
Copyright (c) 2023 Journal of Behavioral Data Science
2023-11-102023-11-1011410.35566/jbds/v3n2/orourkeAPI Face Value
https://jbds.isdsa.org/jbds/article/view/57
<p>Emotion recognition application programming interface (API) is a recent advancement in computing technology that synthesizes computer vision, machine-learning algorithms, deep-learning neural networks, and other information to detect and label human emotions. The strongest iterations of this technology are produced by technology giants with large, cloud infrastructure (i.e., Google, and Microsoft), bolstering high true positive rates. We review the current status of applications of emotion recognition API in psychological research and find that, despite evidence of spatial, age, and race bias effects, API is improving the accessibility of clinical and educational research. Specifically, emotion detection software can assist individuals with emotion-related deficits (e.g., Autism Spectrum Disorder, Attention Deficit-Hyperactivity Disorder, Alexithymia). API has been incorporated in various computer-assisted interventions for Autism, where it has been used to diagnose, train, and monitor emotional responses to one's environment. We identify AP's potential to enhance interventions in other emotional dysfunction populations and to address various professional needs. Future work should aim to address the bias limitations of API software and expand its utility in subfields of clinical, educational, neurocognitive, and industrial-organizational psychology.</p>
Literature ReviewAPIEmotion RecognitionMachine LearningASDADHDAlexithymiaAustin WymanZhiyong Zhang
Copyright (c) 2022 Journal of Behavioral Data Science
2023-07-132023-07-13596910.35566/jbds/v3n1/wymanOn Some Known Derivations and New Ones for The Wishart Distribution: A Didactic
https://jbds.isdsa.org/jbds/article/view/56
<p>The proofs of the probability density function (pdf) of the Wishart distribution tend to be complicated with geometric viewpoints, tedious Jacobians and not self-contained algebra. In this paper, some known proofs and simple new ones for uncorrelated and correlated cases are provided with didactic explanations. For the new derivation of the uncorrelated case, an elementary direct derivation of the distribution of the Bartlett-decomposed matrix is provided. In the derivation of the correlated case from the uncorrelated one, simple methods including a new one are shown.</p>
Theory and MethodsJacobianMultivariate normalityProbability density function (pdf)Triangular matrixBartlett decompositionHaruhiko Ogasawara
Copyright (c) 2022 Journal of Behavioral Data Science
2023-06-212023-06-21345810.35566/jbds/v3n1/ogasawaraUsing Bayesian Piecewise Growth Curve Models to Handle Complex Nonlinear Trajectories
https://jbds.isdsa.org/jbds/article/view/51
<p>Bayesian growth curve modeling is a popular method for studying longitudinal data. In this study, we discuss a flexible extension, the Bayesian piecewise growth curve model (BPGCM), which allows the researcher to break up a trajectory into phases joined at change points called <em>knots</em>. By fitting BPGCMs, the researcher can specify three or more phases of growth without concern for model identification. Our goal is to provide substantive researchers with a guide for implementing this important class of models. We present a simple application of Bayesian linear BPGCMs to childrens' math achievement. Our tutorial includes M<em>plus</em> code, strategies for specifying knots, and how to interpret model selection and fit indices. Extensions of the model are discussed.</p>
Theory and MethodsPiecewise Growth Curve ModelsBayesian SEMModel SelectionLuca MarvinHaiyan LiuSarah Depaoli
Copyright (c) 2022 Journal of Behavioral Data Science
2023-07-132023-07-1313310.35566/jbds/v3n1/marvinPredicting Dyslexia with Machine Learning: A Comprehensive Review of Feature Selection, Algorithms, and Evaluation Metrics
https://jbds.isdsa.org/jbds/article/view/53
<p>This literature review explores the use of machine learning-based approaches for the diagnosis and treatment of dyslexia, a learning disorder that affects reading and spelling skills. Various machine learning models, such as artificial neural networks (ANNs), support vector machines (SVMs), and decision trees, have been used to classify individuals as either dyslexic or non-dyslexic based on functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) data. These models have shown promising results for early detection and personalized treatment plans. However, further research is needed to validate these approaches and identify optimal features and models for dyslexia diagnosis and treatment.</p>
Literature ReviewSVMEEGDyslexiaVelmurugan S
Copyright (c) 2022 Journal of Behavioral Data Science
2023-07-282023-07-28708310.35566/jbds/v3n1/sBayesian IRT in JAGS: A Tutorial
https://jbds.isdsa.org/jbds/article/view/54
<p>Item response modeling is common throughout psychology and education in assessments of intelligence, psychopathology, and ability. The current paper provides a tutorial on estimating the two-parameter logistic and graded response models in a Bayesian framework as well as provide an introduction on evaluating convergence and model fit in this framework. Example data are drawn from depression items in the 2017 Wave of the National Longitudinal Survey of Youth and example code is provided for JAGS and implemented through R using the runjags package. The aim of this paper is to provide readers with the necessary information to conduct Bayesian IRT in JAGS.</p>
TutorialsLogistic Response Model Item Response TheoryBayesian MethodJAGS TutorialKenneth McClure
Copyright (c) 2022 Journal of Behavioral Data Science
2023-03-272023-03-278410710.35566/jbds/v3n1/mccureA Tutorial on Bayesian Analysis of Count Data Using JAGS
https://jbds.isdsa.org/jbds/article/view/49
<p>In behavioral studies, the frequency of a particular behavior or event is often collected and the acquired data are referred to as count data. This tutorial introduces readers to Poisson regression models which is a more appropriate approach for such data. Meanwhile, count data with excessive zeros often occur in behavioral studies and models such as zero-inflated or hurdle models can be employed for handling zero-inflation in the count data. In this tutorial, we aim to cover the necessary fundamentals for these methods and equip readers with application tools of JAGS. Examples of the implementation of the models in JAGS from within R are provided for demonstration purposes.</p>
TutorialsCount dataZero-inflationPoisson regressionZIP modelHurdle modelSijing Shao
Copyright (c) 2022 Journal of Behavioral Data Science
2022-12-142022-12-1415617310.35566/jbds/v2n2/shaoHandling Ignorable and Non-ignorable Missing Data through Bayesian Methods in JAGS
https://jbds.isdsa.org/jbds/article/view/48
<div class="page" title="Page 1"> <div class="layoutArea"> <div class="column"> <p>With the prevalence of missing data in social science research, it is necessary to use methods for handling missing data. One framework in which data with missing values can still be used for parameter estimation is the Bayesian framework. In this tutorial, different missing data mechanisms including Missing Completely at Random, Missing at Random, and Missing Not at Random are introduced. Methods for estimating models with missing values under the Bayesian framework for both ignorable and non-ignorable missingness are also discussed. A structural equation model on data from the Advanced Cognitive Training for Independent and Vital Elderly study is used as an illustration on how to fit missing data models in JAGS.</p> </div> </div> </div>
TutorialsMissing DataBayesian AnalysisStructural Equation ModelingZiqian Xu
Copyright (c) 2022 Journal of Behavioral Data Science
2022-12-132022-12-139912610.35566/jbds/v2n2/xuA Tutorial on Bayesian Latent Class Analysis Using JAGS
https://jbds.isdsa.org/jbds/article/view/47
<p><span class="fontstyle0">This tutorial introduces readers to latent class analysis (LCA) as a model-based approach to understand the unobserved heterogeneity in a population. Given the growing popularity of LCA, we aim to equip readers with theoretical fundamentals as well as computational tools. We outline some potential pitfalls of LCA and suggest related solutions. Moreover, we demonstrate how to conduct frequentist and Bayesian LCA in R with real and simulated data. To ease learning, the analysis is broken down into a series of simple steps. Beyond the simple LCA, two extensions including mixed-model LCA and growth curve LCA are provided to aid readers’ transition to more advanced models. The complete R code and data set are provided.</span> </p> <p> </p>
TutorialsLatent class analysisMixture modelsBayesian analysisMeng Qiu
Copyright (c) 2022 Journal of Behavioral Data Science
2022-12-042022-12-0412715510.35566/jbds/v2n2/qiuThe Performances of Gelman-Rubin and Geweke's Convergence Diagnostics of Monte Carlo Markov Chains in Bayesian Analysis
https://jbds.isdsa.org/jbds/article/view/45
<div id="magicparlabel-353739" class="abstract"> <p>Bayesian statistics have been widely used given the development of Markov chain Monte Carlo sampling techniques and the growth of computational power. A major challenge of Bayesian methods that has not yet been fully addressed is how we can appropriately evaluate the convergence of the random samples to the target posterior distributions. In this paper, we focus on Gelman and Rubin's diagnostic (PSRF), Brooks and Gleman's diagnostic (MPSRF), and Geweke's diagnostics, and compare the Type I error rate and Type II error rate of seven convergence criteria: MPSRF>1.1, any upper bound of PSRF is larger than 1.1, more than 5% of the upper bounds of PSRFs are larger than 1.1, any PSRF is larger than 1.1, more than 5% of PSRFs are larger than 1.1, any Geweke test statistic is larger than 1.96 or smaller than -1.96, and more than 5% of Geweke test statistics are larger than 1.96 or smaller than -1.96. Based on the simulation results, we recommend the upper bound of PSRF if we only can choose one diagnostic. When the number of estimated parameters is large, between the diagnostic per parameter (i.e., PSRF) or the multivariate diagnostic (i.e., MPSRF), we recommend the upper bound of PSRF over MPSRF. Additionally, we do not suggest claiming convergence at the analysis level while allowing a small proportion of the parameters to have significant convergence diagnosis results.</p> </div>
Theory and MethodsConvergence diagnostics Bayesian analysisGelman-Rubin diagnosticGeweke diagnosticHan DuZijun KeGe JiangSijia Huang
Copyright (c) 2022 Journal of Behavioral Data Science
2022-11-142022-11-14477210.35566/jbds/v2n2/p3Relative Predictive Performance of Treatments of Ordinal Outcome Variables across Machine Learning Algorithms and Class Distributions
https://jbds.isdsa.org/jbds/article/view/43
<p>Abstract Ordinal variables, such as those measured on a five-point Likert scale, are ubiquitous in the behavioral sciences. However, machine learning methods for modeling ordinal outcome variables (i.e., ordinal classification) are not as well-developed or widely utilized, compared to classification and regression methods for modeling nominal and continuous outcomes, respectively. Consequently, ordinal outcomes are often treated “naively” as nominal or continuous outcomes in practice. This study builds upon previous literature that has examined the predictive performance of such naïve approaches of treating ordinal outcome variables compared to ordinal classification methods in machine learning. We conducted a Monte Carlo simulation study to systematically assess the relative predictive performance of an ordinal classification approach proposed by Frank and Hall (2001) against naïve approaches according to two key factors that have received limited attention in previous literature: (1) the machine learning algorithm being used to implement the approaches and (2) the class distribution of the ordinal outcome variable. The consideration of these important, practical factors expands our knowledge on the consequences of naïve treatments of ordinal outcomes, which are shown in this study to vary substantially according to these factors. Given the ubiquity of ordinal measures coupled with the growing presence of machine learning applications in the behavioral sciences, these are important considerations for building high-performing predictive models in the field.</p>
Theory and MethodsOrdinal classificationMachine learningPredictive performanceClass imbalanceMeasurement scaleHonoka SuzukiOscar Gonzalez
Copyright (c) 2022 Journal of Behavioral Data Science
2022-12-162022-12-16739810.35566/jbds/v2n2/suzukiA New Bayesian Structural Equation Modeling Approach with Priors on the Covariance Matrix Parameter
https://jbds.isdsa.org/jbds/article/view/41
<p>Bayesian inference for structural equation models (SEMs) is increasingly popular in social and psychological sciences owing to its flexibility to adapt to more complex models and the ability to include prior information if available. However, there are two major hurdles in using the traditional Bayesian SEM in practice: (1) the information nested in the prior distributions is hard to control, and (2) the MCMC iterative procedures naturally lead to Markov chains with serial dependence and the diagnostics of their convergence are often difficult. In this study, we present an alternative procedure for Bayesian SEM aiming to address the two challenges. In the new Bayesian SEM procedure, we specify a prior distribution on the population covariance matrix parameter Σ and obtain its posterior distribution <span id="MathJax-Span-58" class="mi">p</span><span id="MathJax-Span-59" class="mo">(</span><span id="MathJax-Span-60" class="texatom"><span id="MathJax-Span-61" class="mrow"><span id="MathJax-Span-62" class="mi">Σ</span></span></span><span id="MathJax-Span-63" class="texatom"><span id="MathJax-Span-64" class="mrow"><span id="MathJax-Span-65" class="mo">|</span></span></span><span id="MathJax-Span-66" class="mtext">data</span><span id="MathJax-Span-67" class="mo">)</span>. We then construct a posterior distribution of model parameters <span id="MathJax-Element-12-Frame" class="MathJax" style="display: inline; font-style: normal; font-weight: 400; line-height: normal; font-size: 20px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: 0px; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; color: #000000; font-family: 'Noto Sans', 'Noto Kufi Arabic', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif; font-variant-ligatures: normal; font-variant-caps: normal; orphans: 2; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; position: relative;" tabindex="0" role="presentation" data-mathml="<math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="bold-italic">&#x03B8;</mi></math>"><span id="MathJax-Span-68" class="math"><span id="MathJax-Span-69" class="mrow"><span id="MathJax-Span-70" class="mi">θ</span></span></span></span> in the hypothetical SEM model by transforming the posterior distribution of Σ to a distribution of model parameter <span id="MathJax-Element-12-Frame" class="MathJax" style="display: inline; font-style: normal; font-weight: 400; line-height: normal; font-size: 20px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: 0px; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; color: #000000; font-family: 'Noto Sans', 'Noto Kufi Arabic', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif; font-variant-ligatures: normal; font-variant-caps: normal; orphans: 2; widows: 2; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; position: relative;" tabindex="0" role="presentation" data-mathml="<math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="bold-italic">&#x03B8;</mi></math>"><span id="MathJax-Span-68" class="math"><span id="MathJax-Span-69" class="mrow"><span id="MathJax-Span-70" class="mi">θ</span></span></span></span>. The new procedure eases the practice of Bayesian SEM significantly and has a better control over the information nested in the prior distribution. We evaluated its performance through a simulation study and demonstrate its application through an empirical example.</p>
Theory and MethodsStructural equation modelingBayesian analysisInverse Wishart priorInformative priorConvergence diagnosticsHaiyan LiuWen QuZhiyong ZhangHao Wu
Copyright (c) 2022 Journal of Behavioral Data Science
2022-08-072022-08-07234610.35566/jbds/v2n2/p2The Impact of Sample Size on Exchangeability in the Bayesian Synthesis Approach to Data Fusion
https://jbds.isdsa.org/jbds/article/view/38
<p>Data fusion approaches have been adopted to facilitate more complex analyses and produce more accurate results. Bayesian Synthesis is a relatively new approach to data fusion where results from the analysis of one dataset are used as prior information for the analysis of the next dataset. Datasets of interest are sequentially analyzed until a final posterior distribution is created, incorporating information from all candidate datasets, rather than simply combining the datasets into one large dataset and analyzing them simultaneously. One concern with this approach lies in the sequence of datasets being fused. This study examines whether the order of datasets matters when the datasets being fused each have substantially different sample sizes. The performance of Bayesian Synthesis with varied sample sizes is evaluated by examining results from simulated data with known population values under a variety of conditions. Results suggest that the order in which the dataset are fused can have a significant impact on the obtained estimates.</p>
Theory and MethodsBayesian synthesisData fusionExchangeabilityKaterina MarcoulidesJia QuanEric Wright
Copyright (c) 2022 Journal of Behavioral Data Science
2022-07-262022-07-267510510.35566/jbds/v2n1/p5Disentangling the Influence of Data Contamination in Growth Curve Modeling: A Median Based Bayesian Approach
https://jbds.isdsa.org/jbds/article/view/40
<p>Growth curve models (GCMs), with their ability to directly investigate within-subject change over time and between-subject differences in change for longitudinal data, are widely used in social and behavioral sciences. While GCMs are typically studied with the normal distribution assumption, empirical data often violate the normality assumption in applications. Failure to account for the deviation from normality in data distribution may lead to unreliable model estimation and misleading statistical inferences. A robust GCM based on conditional medians was recently proposed and outperformed traditional growth curve modeling when outliers are present resulting in nonnormality. However, this robust approach was shown to perform less satisfactorily when leverage observations existed. In this work, we propose a robust double medians growth curve modeling approach (DOME GCM) to thoroughly disentangle the influence of data contamination on model estimation and inferences, where two conditional medians are employed for the distributions of the within-subject measurement errors and of random effects, respectively. Model estimation and inferences are conducted in the Bayesian framework, and Laplace distributions are used to convert the optimization problem of median estimation into a problem of obtaining the maximum likelihood estimator for a transformed model. A Monte Carlo simulation study has been conducted to evaluate the numerical performance of the proposed approach, and showed that the proposed approach yields more accurate and efficient parameter estimates when data contain outliers or leverage observations. The application of the developed robust approach is illustrated using a real dataset from the Virginia Cognitive Aging Project to study the change of memory ability.</p>
Theory and MethodsRobust methodsgrowth curve modelingconditional mediansLaplace distributionTonghao ZhangXin TongJianhui Zhou
Copyright (c) 2022 Journal of Behavioral Data Science
2022-07-272022-07-2712210.35566/jbds/v2n2/p1How to Select the Best Fit Model among Bayesian Latent Growth Models for Complex Data
https://jbds.isdsa.org/jbds/article/view/39
<p>Bayesian approach is becoming increasingly important as it provides many advantages in dealing with complex data. However, there is no well-defined model selection criterion or index in a Bayesian context. To address the challenges, new indices are needed. The goal of this study is to propose new model selection indices and to investigate their performances in the framework of latent growth mixture models with missing data and outliers in a Bayesian context. We consider latent growth models because they are very flexible in modeling complex data and becoming increasingly popular in statistical, psychological, behavioral, and educational areas. Specifically, this study conducted five simulation studies to cover different cases, including latent growth curve models with missing data, latent growth curve models with missing data and outliers, growth mixture models with missing data and outliers, extended growth mixture models with missing data and outliers, and latent growth models with different classes. Simulation results show that almost all proposed indices can effectively identify the true model. This study also illustrated the application of these model selection indices in real data analysis.</p>
Theory and MethodsModel Selection CriterionBayesian EstimationLatent Growth ModelsMissing DataRobust MethodLaura LuZhiyong Zhang
Copyright (c) 2022 Journal of Behavioral Data Science
2022-06-232022-06-23355810.35566/jbds/v2n1/p2Does Minority Case Sampling Improve Performance with Imbalanced Outcomes in Psychological Research?
https://jbds.isdsa.org/jbds/article/view/37
<p>In psychological research, class imbalance in binary outcome variables is a common occurrence, particularly in clinical variables (e.g., suicide outcomes). Class imbalance can present a number of difficulties for inference and prediction, prompting the development of a number of strategies that perform data augmentation through random sampling from just the positive cases, or from both the positive and negative cases. Through evaluation in benchmark datasets from computer science, these methods have shown marked improvements in predictive performance when the outcome is imbalanced. However, questions remain regarding generalizability to psychological data. To study this, we implemented a simulation study that tests a number of popular sampling strategies implemented in easy-to-use software, as well as in an empirical example focusing on the prediction of suicidal thoughts. In general, we found that while one sampling strategy demonstrated far worse performance even in comparison to no sampling, the other sampling methods performed similarly, evidencing slight improvements over no sampling. Further, we evaluated the sampling strategies across different forms of cross-validation, model fit metrics, and machine learning algorithms.</p>
Theory and MethodsImbalanced dataSampling strategiesMachine learningRoss JacobucciXiaobei Li
Copyright (c) 2022 Journal of Behavioral Data Science
2022-06-152022-06-15597410.35566/jbds/v2n1/p3Book Review: An Introduction to Nonparametric Statistics
https://jbds.isdsa.org/jbds/article/view/34
<pre>This is a brief comparative review of the book An Introduction to Nonparametric Statistics.</pre>
Book and Software ReivewsBook reviewNonparametric StatisticsR softwareKévin Allan Sales Rodrigues
Copyright (c) 2022 Journal of Behavioral Data Science
2022-07-052022-07-0512412710.35566/jbds/v2n1/p8The Role of Personality in Trust in Public Policy Automation
https://jbds.isdsa.org/jbds/article/view/33
<p>Algorithms play an increasingly important role in public policy decision-making. Despite this consequential role, little effort has been made to evaluate the extent to which people trust algorithms in decision-making, much less the personality characteristics associated with higher levels of trust. Such evaluations inform the widespread adoption and efficacy of algorithms in public policy decision-making. We explore the role of major personality inventories -- need for cognition, need to evaluate, the "Big 5" -- in shaping an individual's trust in public policy algorithms, specifically dealing with criminal justice sentencing. Through an original survey experiment, we find strong correlations between all personality types and general levels of trust in automation, as expected. Further, we uncovered evidence that need for cognition increases the weight given to advice from an algorithm relative to humans, and "agreeableness" decreases the distance between respondents' expectations and advice from a judge, relative to advice from a crowd.</p>
Application and Case StudiesPersonality Trust in automationPublic policyDecision-makingPhilip WaggonerRyan Kennedy
Copyright (c) 2022 Journal of Behavioral Data Science
2022-05-112022-05-1110612310.35566/jbds/v2n1/p4/The Lighting of the BECONs
https://jbds.isdsa.org/jbds/article/view/26
<div class="page" title="Page 1"> <div class="layoutArea"> <div class="column"> <p>The imposition of lockdowns in response to the COVID-19 outbreak has underscored the importance of human behavior in mitigating virus transmission. The scientific study of interventions designed to change behavior (e.g., to promote physical distancing) requires measures of effectiveness that are fast, that can be assessed through experiments, and that can be investigated without actual virus transmission. This paper presents a methodological approach designed to deliver such indicators. We show how behavioral data, obtainable through wearable assessment devices or camera footage, can be used to assess the effect of interventions in experimental research; in addition, the approach can be extended to longitudinal data involving contact tracing apps. Our methodology operates by constructing a contact network: a representation that encodes which individuals have been in physical proximity long enough to transmit the virus. Because behavioral interventions alter the contact network, a comparison of contact networks before and after the intervention can provide information on the effectiveness of the intervention. We coin indicators based on this idea Behavioral Contact Network (BECON) indicators. We examine the performance of three indicators: the Density BECON, based on differences in network density; the Spectral BECON, based on differences in the eigenvector of the adjacency matrix; and the ASPL BECON, based on differences in average shortest path lengths. Using simulations, we show that all three indicators can effectively track the effect of behavioral interventions. Even in conditions with significant amounts of noise, BECON indicators can reliably identify and order effect sizes of interventions. The present paper invites further study of the method as well as practical implementations to test the validity of BECON indicators in real data.</p> </div> </div> </div>
Theory and MethodsCovid-19contact networkbehavioral data sciencephysical distancingsocial distancingintervention effectsnetwork analysisDenny BorsboomTessa BlankenFabian DablanderFrenk van HarreveldCharlotte TanisPiet Van Mieghem
Copyright (c) 2022 Journal of Behavioral Data Science
2022-07-042022-07-0413410.35566/jbds/v2n1/p1