In a recent article published in Nature Communications, researchers investigated the value of integrating sales of non-prescription medications (e.g., cough syrups) for improved forecasting of deaths from respiratory diseases in England using artificial intelligence (AI)-based predictive tools.
Background
Respiratory diseases, including influenza-like illnesses (ILI) and coronavirus disease 2019 (COVID-19), asthma, chronic obstructive pulmonary disease (COPD), pneumonia, and bronchitis are a leading cause of death worldwide.
In England and Wales, 369,900 deaths due to all respiratory diseases [classified under the International Classification of Diseases, Tenth Revision Code 10 (J00-J99)] occurred between 2015 and 2019. Moreover, COVID-19 has established itself as a long-standing disease that continues to burden healthcare systems with financial and logistical pressures.
Together, these observations necessitate finding novel methods for better forecasting of deaths due to respiratory diseases and their impact on vulnerable populations locally.
Researchers postulate that models incorporating non-prescription medication sales data could outperform models using sociodemographics and weather data traditionally associated with respiratory diseases, especially in the United Kingdom (UK), where transactional shopping data consist of longitudinal, time-stamped sales logs specified at the store level.
This data is updated in real-time, enabling investigation of behavioral signals across populations and over time. Several prior studies have shown that this data could help get insights into population health, given privacy, ethics, and transparency challenges related to its usage are taken care of.
About the study
In the present study, researchers used data from over two billion shopping transactions logged by high-street retailers in the UK between March 2016 and March 2020 to predict registered deaths from respiratory diseases in 314 Lower Tier Local Authorities (LTLAs) across England.
They leveraged recent advances in variable importance analysis to develop a new AI-based explainability variable importance tool called Model Class Reliance (MCR).
The researchers implemented it on the Prediction of Amount of Deaths by Respiratory disease Using Sales (PADRUS) and Prediction of Amount of Deaths by Respiratory disease Using No Sales (PADRUNOS), which helped them examine whether the integration of non-prescription medication sales data (a variable) in these machine learning (ML)-based models improved their performance against other models delivering comparable performance, known as the ‘Rashomon set’.
Specifically, they compared the accuracy of PADRUS’s weekly predictions of deaths by respiratory disease in each local authority against the baseline and PADRUNOS (comparative models). All three models, baseline, PADRUS, and PADRUNOS were non-linear, and utilized a random forest regressor, allowing for subsequent MCR analysis.
Further, the researchers created input sales features from cumulative weekly sales of cough, dry cough, mucus cough, decongestant, and throat medicines recorded by the retailer through point-of-sale (POS) logging systems. The researchers also determined the number of days in forecast horizons and sales data by weekly deaths registered on Fridays and the week starting and ending on Wednesday and Tuesday, respectively.
Results
Preliminary study results achieved relatively small effect sizes in across-the-year accuracy gains; however, further analysis revealed that over-simplification of the modeling tasks due to suppressed data impeded improvements, and so did a lack of attention to periods of volatile respiratory disease incidence.
Addressing these issues led to substantial gains after the inclusion of medication sales data, with the predictive performance of out-of-sample forecasting increasing by 0.11 (R2) when models also encompassed behavioral sales data.
In addition to confirming the influence of age and population size, MCR analyses showed that integrating cough medication sales within 24 day lag, led ML models to attain optimal performance. Further, MCR analysis of PADRUS could help identify up-to-date variables, which were also easier to access and acceptable for use in public health surveillance systems monitoring respiratory diseases.
Furthermore, MCR analysis suggested that traditional variables were unable to compensate for deviations in disease incidence from seasonal norms. For instance, during the 2017-18 influenza season in the UK, seasonal/temperature variables in PADRUNOS could not adapt to unexpected co-circulation of influenza A and B strains, whereas PADRUS performed far more stably.
Indeed, variables obtained from empirical observation of human behavior provide a more stable and accurate forecasting of diseases under adaptation.
Conclusions
Overall, this study utilized four-year weekly retail sales data of medications for respiratory diseases in 314 LTLAs across England, providing more granular evidence for previously speculated associations between sales of cough and decongestant medicines and rates of ILIs as well as peaks in hospitalizations for respiratory illness occurring later.
Its results evidenced that non-prescription medication sales data inclusion, alongside traditional variables, can enhance forecasting accuracy for respiratory deaths. Timely forecasts in disease surveillance can inform healthcare decision-making, given public health authorities have access to commercial medication sales data in real-time.
Despite the variation across different countries and geographic regions, the key would be to balance the financial cost of incorporating sales data into disease surveillance systems. Moreover, critical evaluation is needed to ensure
that model can adapt to environmental and consumer changes, such as government-induced lockdowns preventing in-store sales.
Simultaneously, careful integration of relevant moderating variables in AI-based tools tracking changes in variable importance across time series data and identifying feature drift would be crucial. Furthermore, research into using sales data to predict deaths in COVID-19 that defy seasonal trends of ILIs could help confirm these findings and offer an opportunity to assess an AI model’s ability to generalize to new data.
Journal reference:
- Dolan, E., Goulding, J., Marshall, H. et al. Assessing the value of integrating national longitudinal shopping data into respiratory disease forecasting models. Nat Commun 14, 7258 (2023). doi:https://doi.org/10.1038/s41467-023-42776-4