Advances in AI-driven modeling improve outbreak predictions, but success hinges on data accessibility.
Study: Artificial intelligence for modelling infectious disease epidemics. Image Credit: Shutterstock AI Generator / Shutterstock.com
Artificial intelligence (AI) has significantly improved the predictability of pathogenic emergence and transmissibility. A recent Nature study emphasizes that the continued success of this technology depends on data transparency and reduced training costs.
How is AI used in healthcare?
Infectious disease epidemiology focuses on the emergence and transmission of infectious diseases among the population and strategies to prevent, control, and mitigate disease outbreaks.
Numerous AI-based applications have been developed to support human health, including patient diagnosis, decision support for doctors, and individual-level disease risk prediction. Currently, AI has been used to a lesser extent in infectious disease epidemiology, which may be attributed to challenges in obtaining large-scale, standardized, and representative data essential for training and evaluating AI or machine learning (ML) models with variable parameters.
Nevertheless, newer AI models are associated with greater competence, even when taught with a smaller amount of data to answer epidemiological questions.
The potential of AI applications in infectious disease epidemiology
In the early stages of any infectious outbreak, it is crucial to understand disease severity and the epidemic potential of the pathogen. Since the true sequence of events and location of the original infection are often uncertain, researchers frequently experience difficulties in estimating the incubation period and transmission intensity from observational data.
Bayesian data augmentation has been invaluable for improving parameter inference. Moreover, integrating AI in the Bayesian data augmentation approach has significantly improved scalability and inference of the models.
Conventional mechanistic and semi-mechanistic disease transmission models provide important insights into viral transmission and are used to develop counterfactual scenarios. However, these models are associated with considerable computational costs, which are partly due to the extensive complexities involved in numerical methods and inference in a high-dimensional parameter space.
Recent advances in AI modelling offer the possibility to accelerate inferences by using variational inference, thereby enhancing model complexity and realism. AI-accelerated methods can potentially reduce model run times from weeks to hours, which can create more opportunities to understand potential associations between individual transmission heterogeneity and population-level outcomes.
The graph neural network (GNN) is a promising AI system that can improve the understanding and accurate forecasting of infectious disease dynamics. Recently, GNN models have accurately predicted coronavirus disease 2019 (COVID-19) cases per region and influenza-like illness rates.
AI models are also applied to genomic data to elucidate virus lineages, viral origin, pathogenicity, transmissibility, and the pathogen’s potential to evade immune responses. These models have improved the accuracy of phylogenetic inference, thereby offering a precise characterization of the infection process.
How does AI help policymakers make public health decisions?
During an infectious disease epidemic, policy decisions are often made based on estimates of the number of current cases and forecasts of future cases. Importantly, epidemic surveillance data are almost always affected by biases in reporting, testing, and sampling.
During the COVID-19 pandemic, researchers significantly accelerated the progress towards the development of more standardized and rigorous models that allow policymakers make appropriate public health decisions. Foundation models from large deep neural networks are a powerful approach to explore and elucidate time-series surveillance data.
New ML and AI approaches have substantially reduced the time required to run epidemiological models to analyze complex scenarios and their statistical uncertainties. Large language models (LLMs) provide summaries of complex quantitative models that are personalized to a decision maker’s preferences.
The successful and appropriate use of AI tools depends on the careful analysis and resolution of key ethical challenges. For example, AI tools for pandemic preparedness and prevention largely depend on fair practices for the collection, storage, and sharing of data, as this ensures widespread accessibility of AI models.
Limitations and recommendations
Current AI models often fail to provide mechanistic insights into the transmission process, lack the power to predict beyond previously observed data and scenarios, and cannot communicate key epidemiological questions and concepts. In the future, an AI-infectious disease assistant could be developed by integrating single task models into more general foundation models.
The potential benefits of AI in public health depend on the availability and accessibility of representative data. A firm ethical framework for storing and sharing data is essential for successful applications of AI in epidemiology.
After the COVID-19 pandemic, significantly more data has become available to teach novel AI models. Nevertheless, routine surveillance data for infectious diseases remains inaccessible to the broader community, which prevents the development of an improved disease modeling system.
The restrictive application of AI models has been attributed to high training costs. Robust data transparency and ethical sharing will essential for developing highly accurate new models at a reduced cost.
Journal reference:
- Kraemer, M. U., Tsui, J. L. H., Chang, S. Y., et al. (2025) Artificial intelligence for modelling infectious disease epidemics. Nature 638(8051); 623-635. doi:10.1038/s41586-024-08564-w