In a recent study posted to the medRxiv* preprint server, researchers identified recurrent mutations in severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2).
This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources
Background
Coronavirus disease 2019 (COVID-19) caused by SARS-CoV-2 typically subsides within a few weeks, but in some cases, the virus may remain for an extended period, and the condition is termed long-term COVID-19. Immunodeficient patients with long-term COVID-19 could be a source for novel (genomic) variations in SARS-CoV-2 critical to its evolution. Previous studies have speculated that the SARS-CoV-2 Alpha and Omicron variants could have emerged from long-term infections.
The study
The present study analyzed recurrent mutations in SARS-CoV-2 and the frequency of recurrence. The researchers built a dataset comprising 168 SARS-CoV-2 genomic sequences from about 28 (immunodeficient) patients for the analysis. Genome series associated with patients were included in the study following a literature search for case studies. Besides, the COVID-19 Genomics UK (COG-UK) dataset was utilized to obtain other genome series.
Genome selection criteria were established, which permitted inclusion of genome series if 1) there were two genomes present on the public databases, 2) persistent long-term COVID-19 for 28 days or more, and 3) sufficient clinical evidence indicating the immunodeficient state in the study subjects. A civet report was generated for the genome series to confirm that the genomes resulted from long-term COVID-19. All genomic sequences within a series were stored in a single multi-fasta file with a header for patient identification with identifiers and the days since the availability of the genome within that (particular) series.
Mutation calling of the genomes was automated, and the genomes were aligned to an annotated reference sequence of SARS-CoV-2. Observed de novo mutations (DNMs), but absent on ‘day 0’ of the genome series, were investigated by processing the mutation calls.
Findings
The authors found the SARS-CoV-2 spike (S) gene with the highest recurrent mutations with the following ten amino acid (aa) substitutions. The domains with the highest DNM occurrences were the receptor-binding domain (RBD) of the S gene (seven events), the N-terminal domain (NTD) [five occurrences], and the signal peptide (SP) [one occurrence]. RBD had the highest aa loci (seven) with DNMs, followed by five in the NTD and one in the SP. The S:E484K mutation was the most frequent DNM with eight occurrences. Clustering all DNMs of this locus (S:484) increased the number of DNM occurrences to 12, indicating its enrichment for DNMs.
Recurrent deletions were observed only in the S gene’s NTD – S:Δ67 region (recurrent deletion region 1 or RDR1), S:Δ138 region (RDR2), and S:Δ243 region (RDR4). S gene that is just 1/8 of the SARS-CoV-2 genome showed about 34% of all DNM occurrences and 59% recurrent DNMs.
Other non-S, non-ORF1ab genes had a lower frequency of DNM occurrences. E:T30I was the only recurrent DNM in the envelope (E) gene with six occurrences, and M:H125Y in the matrix (M) gene was the lone recurrent DNM (four occurrences). Recurrent mutations in the ORF1ab were more but relatively fewer than in the S gene. 86 out of 195 DNMs were found in ORF1ab, but it had only six of the 21 recurrent DNMs.
Comparing recurrent DNMs to the United Kingdom Health Security Agency’s (UKHSA) variant of concern (VOC) or variant under investigation (VUI) definition files revealed S:E484K substitution as the most frequent DNM with 11 appearances. Of the 21 recurrent DNMs observed in the study, nine defined mutations for a VOC or VUI.
Conclusions
The most significant observation was the DNM frequency of the RBD. The RBD, which is just 2% by the length of the entire genome, constituted about 17% of all the observed DNMs indicating strong mutational selection in the immunocompromised individuals. Moreover, 12 DNMs were observed for the S:484 locus highlighting a strong selective pressure at this site, with the S:E484K substitution being the most frequent DNM.
All recurrent deletions of the SARS-CoV-2 genome were present only in the NTD of the S gene. The S:Δ138/RDR2, partly constituting the NTD antigenic supersite targeted by most neutralizing antibodies, had four de novo occurrences.
To summarize, the researchers identified recurrent mutations associated with long-term COVID-19 in immunodeficient subjects. Most of the observed recurrent DNMs are related to immune evasion, enhanced binding affinity to host receptor, or improved virus packaging. Still, these are, in general, less prevalent in a more extensive SARS-CoV-2 population.
Based on these, the authors posit that long-term infection might lead to the selection of mutations, which may partly aid intra-host replication and persistence rather than the general SARS-CoV-2 population in which mutations are strongly selected for inter-host transmission.
This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources
Article Revisions
- May 11 2023 - The preprint preliminary research paper that this article was based upon was accepted for publication in a peer-reviewed Scientific Journal. This article was edited accordingly to include a link to the final peer-reviewed paper, now shown in the sources section.