Researchers in China have proposed a machine-based learning approach that predicts the transmission of COVID-19 and estimates the number of asymptomatic cases more accurately than classic virus transmission models.
Using the machine learning-based fine-grained simulator (MLSim) to improve modeling of the complex transmission process and the potential number of asymptomatic cases could help decision-makers balance containment measures more effectively.
Asymptomatic individuals
The COVID-19 virus, which can be transmitted between humans, does not always cause the associated disease symptoms such as respiratory problems, sore throat, and fever. However, asymptomatic people are still infectious and able to spread the disease to others. A growing body of evidence suggests that the number of asymptomatic but infectious individuals is increasing, with one study has estimated that up to 60% of patients could be either asymptomatic or only suffering from mild symptoms.
Estimating the number of asymptomatic infections that have gone undetected is essential to containing the virus to stop it spreading. However, this is difficult to do accurately.
“Meanwhile, if we can model how the virus transmits, it is fully possible to make inference on the unobserved number of asymptomatic patients from the observed epidemic data,” write Zhi-Hua Zhou (Nanjing University, Nanjing) and colleagues.
The article is currently available as a pre-print in the serve MedRxiv, while it undergoes peer review.
Current models for estimating transmission
Currently, there are two main ways to model transmission and predict disease spread.
One is to use dynamic transmission models such as the Susceptible Infected Recovered (SIR) model, which factors in susceptible cases, infectious cases, recovered cases, and uses equations to model changes in individuals. These models are effective at generalizing predicted transmission in the long-term. However, they often oversimplify the complex, real-world transmission process and can be difficult to match with epidemiologic data, which causes a significant number of errors.
Another approach is to use machine learning models such as recurrent neural networks (RNNs) with long short-term memory (LTSM), which fit well with epidemic data and make accurate predictions that apply to the near future. However, such models are difficult to use to make long-term predictions, as well as being difficult to interpret and challenging to factor different decisions into.
The team’s proposed approach
Now, Zhou and colleagues have tested the machine learning-based fine-grained simulator (MLSim) approach using parameters obtained from 31 provinces in China and six other countries
“Traditional virus transmission models usually make more assumptions than MLSim and left only a few parameters to be determined,” they write. “Having more parameters to be optimized enables MLSim a better representation ability than traditional models.”
The authors say MLSim incorporates many practical factors such as progression of disease during the incubation period; people’s movement between regions; asymptomatic, undetected cases, and the effectiveness of prevention and containment measures.
How these factors interact is modeled using virtual transmission dynamics with undetermined parameters that machine learning has pinpointed in the epidemic data. On learning to fit the real-world data closely, MLSim then predicts the number of asymptomatic individuals.
The simulator’s predictions
The team reports that MLSim made more accurate predictions than the SEIR and LSTM-based models. After learning from data available for China’s mainland, MLSim found that the number of asymptomatic individuals could have been 150,408, which represents 65% of the inferred total number of infections, including cases that had gone undetected. The inferred number of asymptomatic but infectious people on April 15th was 41,387 in Italy; 21,118 in Germany; 354,657 in the United States; 40,379 in France and 144,424 in the UK.
The simulator results also revealed that if the containment measures that the government introduced for the country’s mainland had been put in place 1, 3, 5, and 7 days later than they were (January 23rd), the respective number of confirmed cases on June 12th would have been 109,039 (129%), 183,930 (218%), 313,342 (371%) and 537,555 (637%).
The team’s conclusion
“Machine learning-based fine-grained simulators can better model the complex real-world disease transmission process, and thus can help decision-making of balanced containment measures,” write Zhou and team. “The simulator also revealed the potential great number of undetected asymptomatic infections, which poses a great risk to the virus containment.”
The researchers also point out that this type of “hybrid knowledge and data learning approach” had not been commonly acknowledged among the machine learning community.
“But we found it very useful when the data is scarce while knowledge is rich but inaccurate, such as the situation of a new contagion outbreak,” concludes the team.
Important Notice
medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.
Journal reference:
Zhou Z, et al. COVID-19 Asymptomatic Infection Estimation. MedRxiv 2020. doi: https://doi.org/10.1101/2020.04.19.20068072