Using structured gaussian processing algorithm for spatiotemporal prediction of COVID-19 pandemic in the US

Download PDF Copy

Revised

By Neha MathurReviewed by Aimee MolineuxApr 6 2022

In a recent study posted to the medRxiv* pre-print server, researchers identified spatial/geographical (county-level) features associated with increased coronavirus disease 2019 (COVID-19) cases and death counts in the United States (US) across different temporal phases of the COVID-19 pandemic.

*Study: Using structured gaussian processing algorithm for spatiotemporal prediction of COVID-19 pandemic in the US. Image Credit: 3DJustincase/Shutterstock*

This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources

The team trained and tested a structured gaussian processing (SGP)-based machine learning framework on a geographically-tagged large dataset of demographic, socioeconomic, and political data from all the US counties.

Background

The impact of COVID-19 has been heterogeneous all across the US concerning severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission and COVID-19 mortality.

In the US, public health interventions and resources allocations occur at the county level. COVID-19 spread depends upon proximity, hence spatial analysis, employing geographic information systems (GIS), allowed researchers to investigate associations between demographic, socioeconomic factors, and COVID-19 pandemic dynamics at the county level.

Further, it helped them identify and target areas at the highest risk of becoming a COVID-19 hotspot (spatially) to help flatten the pandemic curve.

About the study

In the present study, researchers gathered county-level daily case counts between January 22, 2020, and March 21, 2021, from the Center for Systems Science and Engineering at Johns Hopkins University; likewise, the United States Census Bureau and the National Center for Health Statistics provided country-specific features.

The team predicted daily COVID-19 case counts and death counts for each county using an SGP regression algorithm at the beginning of each week, starting April 6, 2020, until March 21, 2021.

The model was trained on randomly selected two-thirds of the counties in each state and predicted case and death counts of the remaining one-third of the counties. They normalized the daily COVID-19 case and death counts per 100,000 residents to compute a seven-day moving average.

The team used Pearson’s correlation coefficient (PCC) to assess the accuracy of predictions that represented how well the algorithms captured the event count dynamics; likewise, the proportion of variance (R²) showed the proportion of total variation in the model outcomes.

After recognizing highly predictive spatial features, the researchers used a clustering algorithm termed topic modeling (TM) to identify combinations of spatial features closely linked to the COVID-19 spread.

TM computed sets of co-occurring features that could link counties to topics. The researchers segregated discrete groups of counties with similar spatial features (topic contributions) and derived nine clusters of counties based on the relative contributions of Latent Dirichlet Allocation (LDA) topics.

Within each cluster, they showed topic contributions by plotting the average z-score normalized topic score. Likewise, within each quintile, a histogram showed clusters of counties with a higher incidence of cases and deaths per capita.

Study findings

The overall and median PCC and R² across counties were 0.96 and 0.98, and 0.84 and 0.94, respectively. The observed R² value greater than 0.90 (in most states) demonstrated that the study model built on spatial features could account for most of the variance in the COVID-19 case and death counts.

The predicted COVID-19 cases and death counts were strongly associated with measures of age, urbanicity, and presidential voting margin. Correlation analysis revealed that the interactions between socioeconomic, health, and racial features complicated the interpretation of the relationships between the spatial features and the COVID-19 dynamics.

TM was able to associate features with topics and could group geographically remote but demographically similar counties. Additionally, TM clustered many geographically-similar counties. For instance, in Cluster 1, the Midwest region witnessed the largest surge in the COVID-19 cases and deaths during 2020 and had counties with high scores from topics 1, 3, and 9 and low scores from topic 10.

While TM showed that counties with similar demographic and socioeconomic features tended to cluster together, the unsupervised clustering based on these topics identified county groups that witnessed varying COVID-19 spread.

As clustering delineated cases from deaths and initial phase from nationwide phase dynamics, it highlighted plasticity in the composition of spatial features which were strongly associated with COVID-19 risk.

Accordingly, Cluster 3, geographically restricted to the Southeast US geographical region, was associated with high COVID-19 case counts during the initial phase, and Cluster 0 restricted to Texas and the Rocky Mountain region, was associated with high COVID-19 case counts during the nationwide phase.

Intriguingly, the presidential vote margin was the most consistently selected spatial feature in all the COVID-19 prediction models. It stood independently and showed no collinearity with other spatial factors.

Conclusions

To summarize, the study findings showed that spatial features accounted for the majority of variance in COVID-19 cases and death counts across the US.

Predictive modeling based on combinations of spatial features could identify counties at the highest risk for COVID-19 spread and inform policymakers to prioritize these counties for aggressive mitigation strategies, especially under limited resources.

Importantly, TM provided a novel dimensional reduction approach to examine epidemiologic data and also proved to be a great tool for analyzing datasets with collinear variables.

Journal references:

Preliminary scientific report. Cigdem Ak, Alex D Chitsazan, Mehmet Gonen, Ruth Etzioni, Aaron Grossberg. (2022). Spatial prediction of COVID-19 pandemic dynamics in the United States. medRxiv. doi: https://doi.org/10.1101/2022.03.27.22271628 https://www.medrxiv.org/content/10.1101/2022.03.30.22273175v1
Peer reviewed and published scientific report. Bruel, Timothée, Laurie Pinaud, Laura Tondeur, Delphine Planas, Isabelle Staropoli, Françoise Porrot, Florence Guivel-Benhassine, et al. 2022. “Neutralising Antibody Responses to SARS-CoV-2 Omicron among Elderly Nursing Home Residents Following a Booster Dose of BNT162b2 Vaccine: A Community-Based, Prospective, Longitudinal Cohort Study.” EClinicalMedicine 51 (September). https://doi.org/10.1016/j.eclinm.2022.101576. https://www.thelancet.com/journals/eclinm/article/PIIS2589-5370(22)00306-6/fulltext.

Article Revisions

May 12 2023 - The preprint preliminary research paper that this article was based upon was accepted for publication in a peer-reviewed Scientific Journal. This article was edited accordingly to include a link to the final peer-reviewed paper, now shown in the sources section.

Posted in: Medical Science News | Disease/Infection News | Healthcare News

Comments (0)

Written by

Neha Mathur

Neha is a digital marketing professional based in Gurugram, India. She has a Master’s degree from the University of Rajasthan with a specialization in Biotechnology in 2008. She has experience in pre-clinical research as part of her research project in The Department of Toxicology at the prestigious Central Drug Research Institute (CDRI), Lucknow, India. She also holds a certification in C++ programming.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Mathur, Neha. (2023, May 12). Using structured gaussian processing algorithm for spatiotemporal prediction of COVID-19 pandemic in the US. News-Medical. Retrieved on February 08, 2026 from https://www.news-medical.net/news/20220406/Using-structured-gaussian-processing-algorithm-for-spatiotemporal-prediction-of-COVID-19-pandemic-in-the-US.aspx.
MLA
Mathur, Neha. "Using structured gaussian processing algorithm for spatiotemporal prediction of COVID-19 pandemic in the US". News-Medical. 08 February 2026. <https://www.news-medical.net/news/20220406/Using-structured-gaussian-processing-algorithm-for-spatiotemporal-prediction-of-COVID-19-pandemic-in-the-US.aspx>.
Chicago
Mathur, Neha. "Using structured gaussian processing algorithm for spatiotemporal prediction of COVID-19 pandemic in the US". News-Medical. https://www.news-medical.net/news/20220406/Using-structured-gaussian-processing-algorithm-for-spatiotemporal-prediction-of-COVID-19-pandemic-in-the-US.aspx. (accessed February 08, 2026).
Harvard
Mathur, Neha. 2023. Using structured gaussian processing algorithm for spatiotemporal prediction of COVID-19 pandemic in the US. News-Medical, viewed 08 February 2026, https://www.news-medical.net/news/20220406/Using-structured-gaussian-processing-algorithm-for-spatiotemporal-prediction-of-COVID-19-pandemic-in-the-US.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.