Scientists from China have analyzed millions of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes and identified two major genetic drivers of the ongoing coronavirus disease 2019 (COVID-19) pandemic.
*Important notice: bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.
The study is currently available on the bioRxiv* preprint server.
Background
Since the beginning of the pandemic in China in December 2019, the COVID-19 pandemic has caused more than 642 million infections and over 6.6 million deaths worldwide. Even after three years of the pandemic, the origin of SARS-CoV-2, the causative pathogen of COVID-19, remains uncertain. To prevent the emergence of future coronavirus outbreaks, it is important to know the dynamics of viral transmission from its natural reservoir to humans.
The early genomes of SARS-CoV-2 have been classified into two major lineages, S and L. While the L lineage constituted 70% of the sequenced genomes, the S lineage was associated with 30% of the sequenced genomes during the early phase of the pandemic.
Evolutionary studies have suggested that the S lineage is more related to coronaviruses in animals. However, due to inadequate genome sequences in the early pandemic phase, the exact origin of SARS-CoV-2 could not be detected.
In the current study, scientists have analyzed more than 3 million SARS-CoV-2 genomes from the publicly available viral genome database and used two gene loci to identify the haplotype for the most recent common ancestor (MRCA) and determine the proximal geographical origin of the virus.
Two gene loci used in the analysis were the amino acid 614 of the spike protein (S_614) and the amino acid 48 of the non-structural protein 8 (NSP8_48).
Important observations
The analysis of a total of 3.14 million viral genomes using two gene loci identified seven S_614 alleles, six NSP8_48 alleles, and 16 linkage haplotypes. A haplotype is a group of genes within an organism that was inherited together from a single parent.
The GL haplotype, S_614G and NSP8_48L, constituted about 99% of the sequenced genomes. This was the major haplotype driving the pandemic worldwide. In contrast, the DL haplotype, defined as S_614D and NS8_48L, constituted about 60% of the genomes in China and only 0.45% of the global genomes. This haplotype was the main driver of the pandemic in China in March 2020.
Furthermore, the GS (S_614G and NS8_48S), DS (S_614D and NS8_48S), and NS (S_614N and NS8_48S) haplotypes accounted for 0.26%, 0.06%, and 0.0067% of the sequenced genomes, respectively.
The main evolutionary trajectory of SARS-CoV-2 was determined as DS→DL→GL. The other haplotypes were minor evolutionary byproducts.
One interesting finding of the study was that the newest haplotype GL was associated with the oldest time of MRCA (May 2019). In contrast, the oldest haplotype DS had the newest time of MRCA (October 2019). This observation indicates that the ancestral strains of SARS-CoV-2 that created the GL haplotype became extinct and were replaced by the more adapted novel strains at the place of its origin.
The newer strains emerged, evolved into toxic strains, and triggered an outbreak in China where the GL strains were not preset at the end of 2019. The GL strains started spreading globally even before their discovery and triggered the global pandemic. However, their existence could not be detected until the declaration of the pandemic in China.
Only a slight impact of the GL haplotype was noticed in China during the early pandemic phase because of its late emergence and strict control measures.
Study significance
The study describes the evolutionary trajectory of SARS-CoV-2, which suggests two major onsets of the COVID-19 pandemic. The DL haplotype in China drove one onset, and the GL haplotype drove the other in the rest of the world.
The study could not detect the place of origin of the GL haplotype because this haplotype was already circulating globally before the declaration of the pandemic. However, one recent study has suggested that the GL haplotype might have emerged in Europe.
*Important notice: bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.