The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), belonging to the family Betacoronavirus, causes the coronavirus disease 2019 (COVID-19). COVID-19 was declared a global pandemic by the World Health Organization (WHO) in March 2020. The clinical manifestations of COVID-19 vary from asymptomatic and mild upper respiratory tract infections to severe viral pneumonia.
Study: AImmune: a new blood-based machine learning approach to improving immune profiling analysis on COVID-19 patients. Image Credit: Itee noy / Shutterstock.com
*Important notice: medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.
Background
Studies on SARS-CoV-2 indicate that hyperinflammatory responses are related to disease severity and mortality. Therefore, monitoring these responses could assist in the detection of high-risk patients, as well as provide specific treatments based on their clinical characteristics.
Several aberrant immune phenomena have been associated with disease severity of COVID-19 such as lymphopenia, impaired interferon activity, monocyte-macrophage abnormalities, and cytokine storms. These studies also reveal the role of immunomodulatory agents in attenuating the overactive immune response related to SARS-CoV-2 infection.
Studies on SARS-CoV-2 have generated a large amount of small conditional ribonucleic acid sequencing (scRNA-seq) data. These data are not well utilized, except for meta-analysis and integration.
However, machine learning can reanalyze these data and provide a new understanding of the development of COVID-19. Changes in the level of lymphocyte subpopulations in the blood can serve as a biomarker for diagnosis and severity prediction of COVID-19.
A new study published on the preprint server medRxiv* discusses the application of a novel deconvolution model AImmune that is capable of predicting the proportion of seven different immune cells utilizing bulk RNA-seq results of human peripheral mononuclear cells (PBMCs). This model has the potential to replace the costly scRNA-seq technique for clinical and research purposes.
About the study
The current study involved 23 healthy subjects, 10 moderate cases, and 12 severe cases. The human PBMC scRNA-seq datasets were downloaded from Gene Expression Omnibus (GEO) with three different accession IDs, following which the datasets were processed. The scRNA-seq data then generates pseudo-bulk RNA-seq samples for the development of the AImmune 2.0 Model.
To make the input data more suitable for machine learning algorithms, it is pre-processed. Thereafter, feature selection and model training was carried out. Finally, CIBERSORTxx was used to predict leukocyte subsets and differentially expressed gene (DEG) analysis was done to search for DEGs.
Study findings
The results identified 107 DEGs from 17,206 genes that could distinguish COVID-19 subgroups from the healthy subjects, some of which could serve as important biomarkers. The gene ontology results indicated that degranulation of neutrophils was a major immune response against COVID-19. Also, an additional layer of immune regulation or activation was observed in moderate cases as compared to severe cases.
The results also showed that a lower number of T-lymphocytes, natural killer (NK) cells, and dendritic cells (DCs) was found in the moderate subgroup and even lower in the severe subgroup. However, the number of monocytes was higher in COVID-19 cases as compared to healthy subjects.
Ultimately, AImmune was found to be more reliable as compared to CIBERSORTx in predicting immune cell subsets. Therefore, the AImmune model is of clinical importance, as it has high prediction accuracy, increases the accessibility of immune profiling in clinical applications, and is less expensive as compared to flow cytometry and scRNA-seq.
Limitations
The current study had certain limitations. First, due to the small dataset, it could not provide a generalized result.
Second, there are great consistencies in sample acquisition, processing, and sequencing procedures, since the datasets originated from different studies. Third, the prediction of the model is relatively poor on DCs.
*Important notice: medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.