A Tutorial on Supervised Machine Learning Variable Selection Methods in Classification for the Social and Health Sciences in R

Authors

  • Catherine Bain University of Oklahoma Author
  • Dr. Dingjing Shi Georgia Institute of Technology Author
  • Dr. Yaser M. Banad University of Oklahoma Author
  • Dr. Lauren E. Ethridge University of Oklahoma Author
  • Jordan E. Norris University of Oklahoma Author
  • Dr. Jordan E. Loeffelman University of Oklahoma Author

DOI:

https://doi.org/10.35566/jbds/bain

Keywords:

Machine Learning, Variable selection, Big data, Data classification, R

Abstract

With the increasing availability of large datasets in the behavioral and health sciences, the need for efficient and effective variable selection techniques has grown. While traditional methods like stepwise regression remain prevalent, numerous advanced techniques are available but underutilized in these fields. This tutorial aims to increase awareness and understanding of five variable selection methods available in the popular statistical software R: LASSO, Elastic Net, a penalized SVM classifier, random forest, and the genetic algorithm. Using a recent survey-based assessment dataset on misophonia diagnosis, we provide step-by-step guidance on variables selections and implementation of each method in the context of classification. We discuss the strengths, weaknesses, and performance of each technique, emphasizing the importance of selecting appropriate performance metrics. The associated code and data implemented in this tutorial are available on Open Science Framework and provide an interactive learning experience. We encourage social and health science researchers to adopt these advanced variable selection methods, leading to more robust, interpretable, and impactful models. This paper is written with the assumption that individuals have at least a basic understanding of R.

Author Biographies

  • Dr. Yaser M. Banad, University of Oklahoma

    Associate Professor in the School of Electrical and Computer Engineering 

  • Dr. Lauren E. Ethridge, University of Oklahoma

    Associate Professor, Department of Psychology

  • Jordan E. Norris, University of Oklahoma

    PhD Candidate, Department of Psychology

Downloads

Published

2025-02-28

Issue

Section

Tutorials

How to Cite

Bain, C., Shi, D., Banad, Y., Ethridge, L., Norris, J., & Loeffelman, J. (2025). A Tutorial on Supervised Machine Learning Variable Selection Methods in Classification for the Social and Health Sciences in R. Journal of Behavioral Data Science, 5(1), 1-45. https://doi.org/10.35566/jbds/bain