The virus behind coronavirus disease 2019 (COVID-19), which is causing immense human devastation worldwide, is the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This is the third of three closely related and highly pathogenic zoonotic viruses that have caused three serious outbreaks within less than 20 years.
This indicates that the current pandemic is most probably not the last. A new study, recently published in Experimental & Molecular Medicine, examines the origin and continuing evolution of the virus in the human and animal co-hosts in an attempt to help prepare for future destructive outbreaks.
Multiple coronaviral pathogens
In addition to the highly pathogenic coronaviruses called SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV), endemic coronaviruses have also been recognized. The high prevalence of coronaviruses among humans, as well as the close interactions between humans and the animal hosts of these viruses, signify the strong possibility of ongoing zoonotic outbreaks.
The SARS-CoV, MERS-CoV, and SARS-CoV-2 are all from the sarbecovirus subgroup of the betacoronavirus group of viruses, which are found abundantly among bats and other mammals. The four known endemic seasonal coronaviruses that cause common cold belong to both alpha- and beta-coronaviruses, but the latter from another subgroup called embecovirus.
Coronaviruses have large RNA genomes, about 30 kb in size, with four structural proteins, including the spike protein, envelope protein, membrane protein and nucleocapsid protein. The genome encodes several non-structural proteins within an open reading frame (ORF), which are essential for viral replication and transcription.
Frequent mutations and recombination
Over 60% of transcriptomes within infected human cells were found to be from the virus, indicating that it practically hijacks cell functions once it establishes successful infection. Interestingly, partial transcripts, and noncanonical fusion transcripts, have both been observed, underlining the occurrence of frequent recombination events within the hosts.
Both synonymous and non-synonymous mutations occur in viral genomes, the former not being associated with alterations in the amino acids, unlike the latter. Since only the latter are subject to further refining processes such as purifying natural selection, which weeds out deleterious changes, they are much less common than the latter.
In human proteins, non-synonymous mutations are outnumbered one to five by synonymous mutations. However, in the case of SARS-CoV-2, non-synonymous mutations are still rarer, at one in fifty compared to synonymous mutations.
This indicates that strong purifying pressures are at work to remove harmful mutations, even more than observed with other coronaviruses.
Synonymous mutations are mostly affected by the baseline mutation rates and by genetic drift, that is, the randomness with which various alleles appear or disappear in succeeding generations.
Knowing the rate at which mutations accumulate in the coronavirus genome, it is possible to calculate approximately how far back two strains have to go to reach the most recent common ancestor (MRCA). Synonymous mutations are used to estimate this period due to the more stable rate at which they appear.
Comparing only synonymous mutations shows that SARS-CoV-2 is only 83% identical to the Chinese horseshoe bat coronavirus RaTG13, though they have an overall identity of 96%. This means that they diverged a much longer time ago than the overall identity might suggest.
The estimated MRCA is likely to be at least 30 years in the past, suggesting that other, as yet undiscovered, coronaviruses do exist that have a much closer relationship with the current virus. Monitoring of all coronaviruses from bats and other mammals should be done to identify closer descendants of the MRCA.
Recombination events
The study also suggests that different parts of the SARS-CoV-2 genome evolve at different rates from each other, and may thus be closer to some other coronaviruses than to RaTG13.
The genome of SARS-CoV-2 can be considered a combination of several ‘recombination blocks’ or regions between inferred breakpoints for recombination events.”
Recombination events in the viral spike protein are especially important in the current pandemic, as these mutations have given rise to most of the variants of concern observed recently. The spike protein is crucial to the pandemic, being the protein that engages with the human host cell receptor, the angiotensin-converting enzyme 2 (ACE2), to mediate attachment and virus-cell membrane fusion.
The spike protein is a common site of mutation, and its receptor-binding domain (RBD) is particularly different from the RaTG13 than other regions, indicating that this may be the key to the higher binding affinity shown by SARS-CoV-2 for ACE2. The spike protein sequence appears to be an array of segments that have descended separately from the MRCA.
Some recombination regions, called regions 1, 2, 3 and 4, have been described, covering the open reading frame (ORF1b), the 5’ spike region, the variable loop region of the spike, and the nucleocapsid protein. The phylogenetic relationships are strongest for RaTG13 for R1, 2, and 4, but for R3, it is closest to a pangolin coronavirus.
Strong purifying selection
This latter sample is most closely related to the MRCA of both the SARS-CoV-2 and RaTG13. Though it has not been verified yet that the pangolin coronavirus was a genuine infective agent, or that that pangolin is an intermediate host for the virus, the diverse phylogenetic patterns in different segments of the current virus indicate that different coronaviruses do undergo coinfection and recombination. This indicates the need to take care to prevent such events during human interactions with wild mammals.
Most researchers have concluded that most of the SARS-CoV-2 genome shows the effects of purifying selection, but some positions may have also undergone positive selection. Many of these are in the spike protein, and favor engagement with the ACE2 receptor.
Some mutations that favor the virus may not act on proteins at all. Instead, they may affect the stability of the RNA template used for transcription, which in turn influences viral replication.
Adaptive evolution
The researchers also point out that with the identification of 12 or more major lineages of SARS-CoV-2 to date, it is important to explore the functional consequences of the single nucleotide polymorphisms (SNPs) found in many of them.
Such an understanding may help to predict how the pandemic will develop, thus shaping preventive and therapeutic measures.
Early on, the D614G mutation became dominant globally, but the reason is still unclear, since it does not apparently increase either the virus-receptor binding affinity. However, it does boost the viral load.
With over a hundred million infections, new variants have become fixed worldwide, while founder effects or genetic drift can allow some mutations to become far more common. This indicates that the spike protein continues to produce variants as a result of mutation and recombination.
What are the implications?
It appears that SARS-CoV-2 first arose in non-human mammals via a combination of recombination and purifying selection.
As more and more millions become infected, the virus is likely to spread to animals that have close contact with humans, and from them to other animals. This, in turn, could trigger the emergence of still more dangerous variants through recombination.
It is imperative that epidemiological, genetic, and functional studies of variants be fully utilized to determine how to slow down and ultimately eradicate within- and between-species transmissions.”