COVID-19 Outbreak Prediction and Analysis using Self Reported Symptoms


  • Rohan Sukumaran PathCheck Foundation Author
  • Parth Patwa PathCheck Foundation Author
  • Sethuraman T V PathCheck Foundation Author
  • Sheshank Shankar Author
  • Rishank Kanaparti PathCheck Foundation Author
  • Joseph Bae PathCheck Foundation & Stony Brook Medicine Author
  • Yash Mathur PathCheck Foundation Author
  • Abhishek Singh MIT Media Lab Author
  • Ayush Chopra MIT Media Lab Author
  • Myungsun Kang PathCheck Foundation Author
  • Priya Ramaswamy PathCheck Foundation & University of California San Francisco Author
  • Ramesh Raskar PathCheck Foundation & MIT Media Lab Author



Machine Learning, COVID-19, Outbreak Prediction, Time Series


It is crucial for policymakers to understand the community prevalence of COVID-19 so combative resources can be effectively allocated and prioritized during the COVID-19 pandemic. Traditionally, community prevalence has been assessed through diagnostic and antibody testing data. However, despite the increasing availability of COVID-19 testing, the required level has not been met in parts of the globe, introducing a need for an alternative method for communities to determine disease prevalence. This is further complicated by the observation that COVID-19 prevalence and spread vary across different spatial, temporal, and demographic verticals. In this study, we study trends in the spread of COVID-19 by utilizing the results of self-reported COVID-19 symptoms surveys as a complement to COVID-19 testing reports. This allows us to assess community disease prevalence, even in areas with low COVID-19 testing ability. Using individually reported symptom data from various populations, our method predicts the likely percentage of the population that tested positive for COVID-19. We achieved a mean absolute error (MAE) of 1.14 and mean relative error (MRE) of 60.40% with 95% confidence interval as [60.12, 60.67]. This implies that our model predicts +/- 1140 cases than the original in a population of 1 million. In addition, we forecast the location-wise percentage of the population testing positive for the next 30 days using self-reported symptoms data from previous days. The MAE for this method is as low as 0.15 (MRE of 11.28% with 95% confidence interval [10.9, 11.6]) for New York. We present an analysis of these results, exposing various clinical attributes of interest across different demographics. Lastly, we qualitatively analyze how various policy enactments (testing, curfew) affect the prevalence of COVID-19 in a community.






Theory and Methods

How to Cite

Sukumaran, R., Patwa, P., T V, S., Shankar, S., Kanaparti, R., Bae, J., Mathur, Y., Singh, A., Chopra, A., Kang, M., Ramaswamy, P., & Raskar, R. (2021). COVID-19 Outbreak Prediction and Analysis using Self Reported Symptoms. Journal of Behavioral Data Science, 1(1), 154-169.