Machine learning analysis suggests that there are four sub-phenotypes of long COVID

Download PDF Copy

By Pooja Toshniwal PahariaReviewed by Danielle Ellis, B.Sc.Dec 6 2022

In a recent study published in Nature Medicine, researchers identified PASC [post-acute sequelae of coronavirus disease 2019 (COVID-19)] sub-phenotypes depending on conditions diagnosed within 1 to 3 months of acute infection by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

*Study: Data-driven identification of post-acute SARS-CoV-2 infection subphenotypes. Image Credit: males_design/Shutterstock*

Background

Studies have examined PASC conditions separately without providing evidence of co-occurring conditions. The sun-phenotypes or co-incident patterns, the degree to which PASC conditions and symptoms are co-incident or disproportionately developed among particular patients, could probably aid in revealing PASC pathophysiology.

About the study

In the present study, researchers identified PASC sub-phenotypes by a data-driven approach based on machine learning.

EHR (electronic health record) data of two big CRNs (clinical research networks) from the nationwide PCORnet (patient-centred CRN), i.e., the INSIGHT CRN and the OneFlorida+ CRN. The INSIGHT CRN comprises 12 million NYC (New York City) residents, whereas the OneFlorida+ CRN comprises 19 million individuals residing in Georgia, Alabama, and Georgia.

The INSIGHT and OneFlorida+ CRN individuals comprised the developmental cohort (n=20,881) and validation cohort (n=13,724), respectively. The study comprised SARS-CoV-2-positive individuals, for whom conditions developed between 30 days and 180 days of reported COVID-19 diagnosis were assessed.

COVID-19 diagnosis was based on positive SARS-CoV-2 antigen test or nucleic acid amplification test reports between March 2020 and November 2021. Incidence for 137 probable PASC condition CCSR (clinical classifications software refined) categories, defined by the ICD-10 (International Classification of Diseases, 10^th revision) codes, was assessed.

The TM (topic modeling) approach was used to identify co-incident patterns of the PASC conditions, depending on which PASC sub-phenotypes were determined. After obtaining high-dimensional binary representations of PASC conditions (step 1), the algorithm learned PASC topics (T) (step 2) and inferred the patient representations in the low-dimensional PASC topic space (step 3) via the topic-modelling approach. PASC sub-phenotypes were determined based on patient clusters representing PASC topics (step 4).

PASC co-incidence patterns of SARS-CoV-2-positive and SARS-CoV-2-negative individuals were compared based on the generated heat maps, and the entropy of every topic vector was calculated. The robustness of the identified PASC sub-phenotypes was evaluated based on propensity score (PS) adjustments. Further, the team quantitatively compared the topics. The original set of topics learned from the 137 PASC conditions with cosine similarity and similar topics learned from the two CRN cohorts were quantitatively evaluated.

Results

Four PASC sub-phenotypes were identified. Sub-phenotype 1 comprised 7,047 (34%) patients and was predominated by renal-associated, circulation-associated, and cardiac-associated illnesses (T-3, 8, 10), such as kidney failure, circulatory and cardiac disorders, and fluid and electrolyte imbalance. The median patient age was 65 years, and 49% of them were men. The patients had high acute COVID-19 severity [hospitalization (61%), mechanical ventilator needs (5.0%), and critical care admissions (10%).

The sub-phenotype had the greatest percentage of SARS-CoV-2-positive patients (37%) during the initial COVID-19 wave (between March and June 2020). The sub-phenotype individuals had an elevated burden of comorbidities and were largely prescribed for anemia, circulatory disorders, and endocrine disorders.

Sub-phenotype 2 was dominated by sleep, anxiety, and respiratory disorders. The sub-phenotype comprised 6,838 (33%) patients and was predominated by pulmonary disorders (T-4,7,9), anxiety, sleep disorders, chest pain, and headaches. The median age of the patients was 51 years, and 63% of them were female, with 31% acute COVID-19 hospitalizations.

The sub-phenotype had the greatest fraction (65%) of patients diagnosed with COVID-19 between November 2020 and November 2021. Sub-phenotype 2 individuals were largely prescribed anti-allergy, anti-inflammatory, and anti-asthma medications, such as inhaled steroids, montelukast, and levalbuterol.

Sub-phenotype 3 comprised 23% (n=4,879) of individuals with disorders of the nervous and musculoskeletal systems (T-1,5,6), including pain of musculoskeletal origin, sleep disorders, and headaches. The median patient age was 57 years, and 61% of them were female. The sub-phenotype comprised the greatest percentage of individuals with >5.0 outpatient setting visits before COVID-19 (78%). The sub-phenotype individuals were mostly prescribed with analgesic medications (such as ketorolac and ibuprofen).

Sub-phenotype 4 comprised 10% (n=2,117) of individuals with mainly respiratory and digestive disorders (T-2, 4, 8). The median patient age was 54 years, and 62% of them were female, with the greatest rates for zero visits to emergency departments (57.0%) and the least mechanical ventilator use rates (one percent) and admissions to critical care units (three percent) during acute COVID-19. The sub-phenotype individuals were largely prescribed digestive system disorder medications.

The topics learned from SARS-CoV-2-negative individuals showed greater entropy values than SARS-CoV-2-positive patients. Cosine similarity findings confirmed the robustness of the PASC sub-phenotype classification, and the patterns of co-incidence observed for the two CRN cohorts were similar for SARS-CoV-2-positive individuals. On the contrary, the topics for uninfected individuals were dissimilar to those learned from SARS-CoV-2-positive individuals with lesser concentration patterns.

Conclusion

Overall, the study findings highlighted four reproducible data-driven PASC sub-phenotypes identified by machine learning. The findings could aid health authorities in improving PASC management.

Journal reference:

Zhang, H. et al. (2022) "Data-driven identification of post-acute SARS-CoV-2 infection subphenotypes", Nature Medicine. doi: 10.1038/s41591-022-02116-3. https://www.nature.com/articles/s41591-022-02116-3

Posted in: Medical Science News | Medical Research News | Disease/Infection News

Comments (0)

Written by

Pooja Toshniwal Paharia

Pooja Toshniwal Paharia is an oral and maxillofacial physician and radiologist based in Pune, India. Her academic background is in Oral Medicine and Radiology. She has extensive experience in research and evidence-based clinical-radiological diagnosis and management of oral lesions and conditions and associated maxillofacial disorders.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Toshniwal Paharia, Pooja Toshniwal Paharia. (2022, December 06). Machine learning analysis suggests that there are four sub-phenotypes of long COVID. News-Medical. Retrieved on February 09, 2026 from https://www.news-medical.net/news/20221206/Machine-learning-analysis-suggests-that-there-are-four-sub-phenotypes-of-long-COVID.aspx.
MLA
Toshniwal Paharia, Pooja Toshniwal Paharia. "Machine learning analysis suggests that there are four sub-phenotypes of long COVID". News-Medical. 09 February 2026. <https://www.news-medical.net/news/20221206/Machine-learning-analysis-suggests-that-there-are-four-sub-phenotypes-of-long-COVID.aspx>.
Chicago
Toshniwal Paharia, Pooja Toshniwal Paharia. "Machine learning analysis suggests that there are four sub-phenotypes of long COVID". News-Medical. https://www.news-medical.net/news/20221206/Machine-learning-analysis-suggests-that-there-are-four-sub-phenotypes-of-long-COVID.aspx. (accessed February 09, 2026).
Harvard
Toshniwal Paharia, Pooja Toshniwal Paharia. 2022. Machine learning analysis suggests that there are four sub-phenotypes of long COVID. News-Medical, viewed 09 February 2026, https://www.news-medical.net/news/20221206/Machine-learning-analysis-suggests-that-there-are-four-sub-phenotypes-of-long-COVID.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.