Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and severe acute respiratory syndrome coronavirus (SARS-CoV) are members of the subgenus Sarbecovirus, whose genomic length is around 30,000 base pairs. These viruses encode four structural proteins, namely spike (S), envelope (E), membrane (M), and nucleocapsid (N).
Background
The S protein contains a Receptor Binding Domain (RBD) which binds to the host receptor, i.e., angiotensin-converting enzyme 2 (ACE2). RBD is the most variable part of coronaviruses which determines the host range of each member of this group.
To date, the evolutionary origin of SARS-CoV-2 has not been elucidated. Since genomic sequence analysis indicated that the closest relative of this virus is the bat Sarbecovirus, there is a high possibility that the human-infecting SARS-CoV-2 strain emerged after spillover from bats directly or via an intermediate host.
Similar to coronaviruses, Sarbecoviruses are highly recombinant, and analysis of the SARS-CoV-2 RBD sequence indicated its origin involves genetic recombination. Out of four key hypotheses on the origin of SARS-CoV-2, three include recombination.
Detection of recombination events is imperative to analyze the evolutionary history of the S gene, particularly in SARS-CoV-2. Although several tools, such as SimPlot, RDP4, and GARD, have been used to identify the presence of recombination among Sarbecovirus, they failed to estimate Ancestral Recombination Graphs (ARGs) that characterize reticulated evolution triggered by genetic recombination. Thereby, there are phylogenetic uncertainties in the estimation of recombination events.
Although reconstruction of ARGs based on sequence data is extremely difficult, several software packages, such as BEAST2 package Bacter, have been developed to overcome the challenge. The ClonalOrigin model in Bacter has been used to estimate a new type of ARC, known as Ancestral Conversion Graphs (ACG)s. This method can also be used to estimate recombination events within a phylogeny.
About the study
A study currently posted on the Research Square* preprint server while under consideration for publication in Scientific Reports has employed Bacter to identify recombination events with the RBD regions of the Sarbecovirus genome. The main aim of this study is to determine the origin of the amino acids located in the RBD's variable loop, which is responsible for the high affinity of SARS-CoV-2 to the ACE2 receptors of human cells.
A total of eighty-seven genomes were obtained from the GenBank and GISAID databases. The sequences were aligned using MAFFT 7.475's default settings. The Gblocks program enabled the detection of poorly aligned amino acid positions. In this study, SARS-CoV-2 RBD-defined regions were extracted from the full genome alignment and analyzed using the Bayesian skyline coalescent model.
Study findings
A recombination-aware phylogenetic analysis of the RBD region was performed in thirty-nine Sarbecoviruses. Interestingly, multiple recombination events were detected with a posterior probability support greater than 0.5 related to different Rhinolophus species, indicating a close interaction between the bat population.
Three Rhinolophus species showed overlapping geographical ranges, namely, R. pusillus, R, sinicus, and R. affinis. However, R. affinis and R. pusillus were proposed to be the possible hosts of SARS-CoV-2 progenitors.
A recombination event within the RBD involving RaTG13 supported the common ancestor hypothesis. This hypothesis states that the bat virus lost all but one amino acid residue, which was present in the common ancestors of SARS-CoV-2, RaTg13, BANAL-103, and GD410721. The ancestral virus might have been a pathogen capable of infecting different mammalian hosts. This observation was validated by laboratory experiments that revealed SARS-CoV-2 could bind to the ACE2 receptors of cats, cattle, and dogs.
The findings of the study strongly support the natural emergence of SARS-CoV-2 hypothesis. This virus underwent vertical evolution, combined with several recombination events, to become extremely successful in human transmissions.
Notably, the recombination event could have occurred beyond the RBD region, i.e., the recombination event could have occurred at any point of the Sarbecovirus genome. The current study detected recombination events associated with the SARS-CoV-2 lineage on the 5' and 3' ends of the S gene.
Conclusions
A recombination-aware phylogenetic analysis of Sarbecoviruses helped elucidate the evolutionary origin of SARS-CoV-2. Nevertheless, the computational approach used in this study limited the analysis of the full data set. The methodology used in this study enabled the analysis of only a fragment of Sarbecoviruses genomes. Hence, in the future, more research associated with the analysis of the entire genomic sequence is required to understand the recombination history of the RBD better.