In view of increasing case numbers of coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), it is important to develop accurate methods of risk prediction to help treat those at the highest risk.
A new study in the journal JAMA Network reports on the disease severity associated with some clades of the virus, from an analysis of genomes from specimens isolated in the initial phase of the pandemic.
Study: Genomic Epidemiology of SARS-CoV-2 Infection During the Initial Pandemic Wave and Association With Disease Severity. Image Credit: Imilian / Shutterstock
The course of the first wave
The US has been hit by three successive waves of COVID-19, with a total of over 27 million infections. Early cases came from infections imported by travelers. These were accompanied by high rates of hospital admissions and case fatality rates (CFR) in the elderly as well as those with pre-existing illnesses.
As community transmission became more important, the length of hospital stays and the CFR both came down, even as hospitals were crowded. This was attributed to the evolution of efficient treatment protocols, ignoring the question as to whether viral evolution itself played an important role.
Genomic sequencing has been crucial to the current understanding of the pandemic. Six different clades of the virus have been identified on the Global Initiative on Sharing All Influenza Data (GISAID) database, namely, S, L, V, G, GH, and GR, also known as the lineages A, B, B.2, B.1, B.1., and B.1.1.1, respectively.
These are different from the ancestral Wuhan clade. The three G clades (G, GH and GR) all contain the D614G variant of the spike gene, which is thought to be more infectious but less virulent than the Wuhan strain.
Study aims and population
The current study aimed to match the variants with disease severity and clinical outcomes, to understand how the virus adapts to its human hosts and
The study was carried out on viral specimens isolated from 300 patients with confirmed SARS-CoV-2 infection. The median age of the patient group was 53 years, with 60% being females.
Over two-thirds were white, and over 40% were healthcare staff. The median cycle threshold (Ct) for the specimens was 19.4 samples. No samples with a Ct over 30 were used.
A third of all patients had moderate or severe disease, requiring hospitalization, and a tenth required intensive medical care. The mortality was 6%.
Mutations and their variation
There was a total of >2,500 samples, mostly repeated samples, with 484 unique variants. However, over half of the 484 variants were missense mutations, a third being silent mutations.
Most mutations (62%) occurred in the open reading frame ORF1ab, while 13%, 6% and 4% were in the viral spike (S), nucleocapsid (N) and ORF3a, respectively. The most frequent non-synonymous mutations were the spike D614G and the ORF1ab P323L.
The latter was not associated with greater severity of illness, however.
Clades by phase
In the initial phase, these six different clades of the virus were in circulation. The seven-day rolling case average was increasing to its peak on April 11, 2020, before registering a decline. During this period, group 2 clades, including the D614G-containing G, GR and GH clades, were predominant, at 84% of all identified viral isolates.
Of the remaining 16%, or group 1 isolates, none belonged to the L clade. The Wuhan clade comprised part of this 16%.
Younger patients were in group 2, and older patients in group 1, the median ages being 50 and 62 years, respectively. Patients infected with the GR clade and the Wuhan strain were youngest and oldest in this study group, at 41 and 68 years, respectively.
Both sexes and all ethnicities were equally represented over all clades and both groups.
The rapid shift to group 2
Group 1 clades circulated at higher levels initially, but the diversity rapidly reduced within the next two weeks. Thus, at the end of the study period, just five weeks from the start, group 2 clades were dominant, making up 90% of all circulating strains.
Non-healthcare staff had a higher proportion of group 1 clades, about one in twenty.
Clinical outcomes
The variants associated with a lower risk of hospitalization included L4182F in ORF1ab, and S24L of ORF8. The D614G spike was linked to higher survival of hospitalized variants, with an 87% survival, as were several group 2 clade variants like 241C>T, 3037C>T, and 14408C>T.
However, no clades were associated with a lower chance of hospitalization or intensive care unit admission. Mortality was higher with clade V, and with group 1 clades compared to group 2.
Interleukin-6, creatinine, and D-dimer levels were significantly different among different variants, while creatinine levels were also higher among clade V patients.
With the analysis of multiple variables, as expected, male patients and the elderly had a higher risk of hospitalization. Still, no clade or clade group showed significant association with this outcome. Non-D614G-containing variants had a higher risk of mortality, while ORF3a variants had a favorable prognosis.
Mortality rates were higher in the elderly, immunosuppressed and those with high creatinine levels, and in group 1 clades.
What are the implications?
The researchers point out that many sequencing studies on the SARS-CoV-2 occur after the initial diversity has decreased, accounting for the failure to identify differences in outcomes with different clades. The overwhelming dominance of D614G also blurs smaller differences between individual clades.
To overcome this, the current study used data from early specimens with a representative sampling of the Cleveland patient community in the first wave. At this point, most patients were older, with lung and heart disease, and mostly came from disadvantaged communities.
At this time, five of the six clades identified so far were in circulation, as well as the Wuhan strain. “Such early diversity is consistent with the interpretation that multiple SARS-CoV-2 infection events occurred in this community through repeated introduction of viruses from Asia, Europe, and elsewhere within the US.”
It is important to note that the large shift to the D614G variants is due to its increased infectivity. It is thought that this mutant has alterations in the electrostatic interactions between the different subunits of the spike protein, making it more fusogenic and increasing its binding affinity to its host cell receptor, the angiotensin converting enzyme 2 (ACE2).
It is interesting that though group 1 was already well established in the community by the time group 2 clades began to emerge, the latter rapidly rose to dominance, indicating its fitness advantage. The travel bans may have helped by preventing the introduction of new, especially fitter, clades, thus also limiting the mortality.
Since the study period covered only the first few weeks of the pandemic in the US, improvements in treatment, the availability of intensive care beds and antivirals, are probably not the reason for the reduction in mortality despite the heavy number of hospitalizations. Also interestingly, the clades that faded out, namely, the S, V and Wuhan clades, were more virulent than the D614G and other later clades.
The researchers suggest, “Our findings demonstrate that the continued evolution of SARS-CoV-2 leads to less virulence.”
Clade V is defined by the L37F ORF1ab and the G251V ORF3a variants, which alter the non-structural proteins NSP6 and NS3, respectively. The first may enhance asymptomatic spread, and the second with reduced protein flexibility. If so, the latter may cause some binding sites of the antibody to be lost, resulting in immune escape. The binding affinity is also very low relative to the wildtype virus.
The association of clade V with higher creatinine levels compared to other clades may indicate that it causes renal injury in particular. Further studies are required to compare viral genomes in patients with and without renal dysfunction. Such clade assignments may help predict the clinical outcomes of future infections.